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Chapter 1 A) 
Introduction E 


1.1 The Origins of this Book 


Many interesting research projects begin with an apparently simple question. The 
question with which this project began arose in the context of managing an online 
course. 

In 2008, I and a colleague at the UK Open University were charged with designing 
and implementing a postgraduate distance-learning course with the title Accessible 
Online Learning: Supporting Disabled Students. The course was to be taken entirely 
online and would contribute to the University’s master’s programme in Online and 
Distance Education. All the course material was available online, either from a dedi- 
cated website or through the University library’s online resources. In particular, the 
course textbook (Seale, 2006) was available free for students as an e-book. The 
course would run annually from September to January and would be equivalent to 
one quarter of a year’s full-time study. 

We recruited four experienced associate lecturers to serve as tutors for the course. 
They were each assigned 10-20 students, and their duties were to moderate online 
discussion forums among the group of students and to assess the assignments 
submitted by each student. The assignments were to be submitted in Microsoft Word 
format through a dedicated online system, and the students were given advice and 
instructions with regard to essay structure and referencing. The tutors provided an 
overall evaluation and mark (out of 100%) on each of the assignments using a sepa- 
rate form, and they provided more specific feedback in the margins of the assignment 
using Word’s “comment” facility. 

During the first presentation of the course in 2008-2009, one of the tutors 
suggested that students should be required to submit their assignments using a sans 
serif typeface, on the basis that “everybody knew” that sans serif typefaces were 
easier to read on screen and that this would make the task of evaluating the students’ 
assignments more efficient. This seemed to overlook a number of points regarding 
the process of learning and assessment: 
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e Many students were using the default typeface in Microsoft Word to type their 
assignments. At the time, this was the serif typeface Times New Roman. 

e The idea that the appearance of written work could be changed to suit the potential 
readers’ abilities, skills, and preferences was introduced during the course. 

e A tutor who downloaded an assignment to provide feedback could change the 
appearance of the assignment to suit their own abilities, skills, and preferences. 

e Tutors might choose to evaluate assignments on screen, or they could instead 
choose to print off the assignments to read and evaluate them in hard copy. 


The last point raised the question of the legibility of serif and sans serif typefaces 
when they were used to produce material intended to be read on paper or other hard 
surfaces. Here, a cursory view of the literature suggested that “everybody knew” that 
serif typefaces were easier to read on paper than were sans serif typefaces. Of course, 
the fact that everybody knows something is no guarantee that it is true. Until the time 
of the Pythagorean School in the sixth century BC, “everybody knew” that the earth 
was flat; and until the time of Copernicus in the sixteenth century AD, “everybody 
knew” that the sun rotated around the earth. Nowadays, we expect such matters to be 
determined by empirical evidence, not by majority opinion. This book is concerned 
with the empirical evidence concerning the relative legibility of serif typefaces and 
sans serif typefaces: Part I is concerned with their legibility in “hard copy” (i.e., 
when presented on paper or other hard surfaces), and Part II is concerned with their 
legibility when presented on computer monitors or other screens. 


1.2 Serif Typefaces 


There are many dimensions on which typefaces can vary, but this book is concerned 
with the legibility of typefaces with and without serifs. A serif is “a short, light line 
projecting from the main stroke of a letter” (Chicago Manual of Style, 2003, p. 837) 
that often takes the form of a small finishing stroke at the top or bottom of a letter. 
(In English-speaking countries, typographers have variously spelled the word ceref, 
ceriph, seriff, seriph, seryph, surriph, surripse, surryph, or syrif: Mosley, 1999, pp. 
53, 55.) One feature of typefaces with serifs is that the main strokes constituting 
each letter are often of varying thickness. The left-hand panel of Fig. 1.1 shows 
some examples of common serif typefaces (Baskerville, Garamond, Palatino, and 
Times New Roman) as they are currently rendered in Microsoft Word. (The example 
sentence is a pangram—a single sentence that uses all 26 letters in the English 
alphabet—which is often used as a typing exercise.) 

Serifs have been noted in Greek inscriptions from the fourth century BC, but 
they became widely adopted during the Roman Empire (from 27 BC to AD 476) 
(Mosley, 1999, p. 18), when they are generally thought to have resulted from the 
practices of Roman masons (Bringhurst, 2019, pp. 119-120). It is likely that the 
latter used a flat or square-edged brush to draft symbols on pieces of stone before 
carving them with chisels, and the serifs were left at the ends of the brushstrokes 
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Baskerville: The quick brown fox Arial: The quick brown fox jumps over 
jumps over the lazy dog. the lazy dog. 


Garamond: The quick brown fox Comic Sans: The quick brown fox 
jumps over the lazy dog. jumps over the lazy dog. 


Palatino: The quick brown fox Tahoma: The quick brown fox jumps 


jumps over the lazy dog. over the lazy dog. 


Times New Roman: The quick brown | Verdana: The quick brown fox 
fox jumps over the lazy dog. jumps over the lazy dog. 


Fig. 1.1 Examples of common serif typefaces (Baskerville, Garamond, Palatino, and Times New 
Roman) and common sans serif typefaces (Arial, Comic Sans, Tahoma, and Verdana) 


(Catich, 1991). By the beginning of the Common Era, Roman inscriptions used a 
standard alphabet of capital letters in which serifs were a characteristic feature, and 
many examples have survived in their original locations or else in museums to the 
present day. A commonly cited example is the inscription on the base of the column 
that was completed in AD 113 to commemorate the emperor Trajan’s victory in 
the Dacian Wars, shown in Fig. 1.2. The column itself survives largely intact in the 
otherwise ruined remains of Trajan’s forum near the Piazza Venezia in the centre of 
modern Rome. 

Smaller versions of these letters were used when writing books. However, towards 
the end of the eighth century AD, a simplified version of this alphabet, nowadays 
known as the “Carolingian minuscule,” was introduced across the Holy Roman 
Empire as a standard handwritten script for rendering the Vulgate Bible in Latin. This 
incorporated strokes that extended above or below the main body of each letter but 


SENATVSPOPVLVSQVEROMANVS 7 
IMP-CAESARLDIVINERVAEF-N ERVAE 
TRAIANOAVGGERMOJACICOPONTIF 
MAXIMOTRIBPOTXVIHMPVICOSVIPP 
ADDECLARANI WVMOVANTAEALTITVDINIS 
MONSETLOCVSTAN* = IBVSSITEGESTVS 


Fig. 1.2 The surviving inscription on the base of Trajan’s Column. Licensed under the Creative 
Commons Attribution—Share Alike 3.0 Unported (CC BY-SA 3.0), https://commons.wikimedia. 
org/wiki/File:Base_columna_trajana.jpg 
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retained the use of serifs. In the early fifteenth century, Italian calligraphers combined 
the Roman capitals with the Carolingian minuscule; these were used as the basis for 
the earliest Western typefaces and evolved into the combination of uppercase and 
lowercase alphabets that is used in Western countries today (see Bigelow, 1981, for a 
more detailed account, of which this is mainly a summary). Due to the origins of their 
capital letters in inscriptions from the Roman Empire, serif typefaces are sometimes 
referred to as Roman or roman. For instance, Times New Roman was developed by 
Stanley Morison, working for the Monotype foundry, for use in the London news- 
paper, The Times, in 1932. A rival printing company, Linotype, developed a similar 
typeface known as “Times Roman” (or even just as “Times”’). 

Several theories have been put forward regarding why serifs should have survived 
in modern typography: 


e Early researchers sometimes claimed that serifs provided additional visual cues 
to enable readers to direct their gaze at successive words in a line of text. This 
idea can be found even in some modern accounts. However, the work of Hering 
(1879) and Lamare (1892) showed that the eye movements of experienced readers 
consist not in a continuous horizontal gaze but in a series of discrete fixations 
separated by jumps or “saccades”. (This finding is often erroneously attributed 
to their colleague, Louis Emile Javal: see Wade & Tatler, 2008, for discussion of 
this issue.) 

© Other early researchers argued that serifs helped to overcome the harmful effects 
of “irradiation” (see Pyke, 1926, pp. 21, 99-101, for examples). The latter is a 
well-established optical illusion whereby a dark figure that is presented against 
a light visual field appears to be larger than an otherwise identical light figure 
that is presented against a dark visual field. Taylor (1934) claimed that irradiation 
explained why letters printed in serif typefaces were harder to read when shown in 
white print against a black background than when shown in black print against a 
white background. Nevertheless, the relevance of this phenomenon to the legibility 
of typefaces is otherwise unclear. 

e Robinson et al. (1971) hypothesised that serifs facilitated the operation of line 
detectors in the human visual system. They found evidence in support of this idea 
using a computer model of visual processing. Nevertheless, computational models 
which assume that specific and unique brain cells are dedicated to the detection of 
lines or other features have since been criticised in favour of connectionist models 
which assume that groups of brain cells function as a distributed network (e.g., 
Schiffman, 2000, pp. 83-85, 163—166). 


1.3 Sans Serif Typefaces 


Sans serif typefaces are presented without serifs. (In English-speaking countries, 
typographers have variously used the expressions sans-ceriph, sans-serif, sans- 
surryph, sanserif , and sansserif: Mosley, 1999, p. 53.) In contrast to serif typefaces, 
the strokes constituting each letter are often of constant thickness. The right-hand 
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panel of Fig. 1.1 shows examples of common sans serif typefaces (Arial, Comic 
Sans, Tahoma, and Verdana) as they are currently rendered in Microsoft Word. 
Early Greek and Etruscan inscriptions routinely employed a sans serif style 
(Mosley, 1999, p. 17), and they were widely adopted during the Roman Republic 
(from 509 to 27 BC) (Bringhurst, 2019, p. 261). In contrast to serif inscriptions, rela- 
tively few examples survive today (Lightfoot, 2009), partly because of the poorer 
quality of the material in which they were inscribed (wood or local stone, rather than 
marble) and partly because from time to time the Republican authorities discouraged 
the construction of inscribed monuments. Nevertheless, the National Archeological 
Museum in Aquileia in north-eastern Italy contains many Republican inscriptions 
using sans serif capital letters (Clough, 2015), and about 150 of these are displayed 
in the online Lupa database (http://lupa.at). Figure 1.3 shows an inscription dating 
from the middle of the second century BC that was discovered in the area of the 
Roman forum in Aquileia in 1995. It came from a monument (now lost) to Titus 
Annius Luscus, who was elected as praetor in 156 and consul in 153 BC. Clough 


Fig. 1.3 An inscription in honour of Titus Annius Luscus. Reproduced by kind permission of James 
Clough from https://articles.c-a-s-t.com/letter-hunting-in-italy-2-e7b5 1cd821a6 
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(2020) provided a more detailed discussion of this inscription. (Iam grateful to James 
Clough for permission to reproduce his photograph of the inscription here.) 

Sans serif inscriptions had a brief revival in the fifteenth century, when they were 
used to decorate buildings and monuments in a number of Italian cities (Clough, 
2020; Gray, 1960). The origins of the style are a matter of debate (Stiff, 2005). 
Nevertheless, these developments had little or no effect on the evolution of early 
typefaces. Instead, early typographers used serif typefaces modelled on surviving 
inscriptions from Imperial Rome, and these had become widely adopted by the end 
of the eighteenth century. 

During the latter part of the eighteenth century and the early nineteenth century, 
sans serif inscriptions became popular on monuments, public buildings, and garden 
features in the United Kingdom, where they were seen as being more natural or prim- 
itive than serif inscriptions. Between 1800 and 1820, hand-etched sans serif letters 
were used in publicity materials printed for commercial signwriters or engravers 
and on the title pages of British catalogues of antiquities, while lowercase sans serif 
letters were engraved from around 1810. At this time, the style was described as 
“Egyptian”, and it was this term that was used to describe the first uppercase sans 
serif typeface produced in 1816. However, the first sans serif typeface produced in 
both uppercase and lowercase in 1832 was described as “sans-serif”, and variations 
of this term were used thereafter. Such typefaces were occasionally referred to as 
“antique” or “grotesque”, and the equivalent terms were adopted by typographers in 
France and Germany, respectively, later in the nineteenth century (see Mosley, 1999, 
for a more detailed account, of which this paragraph is mainly a summary; see also 
Mosley, 2007). 

Possibly in reaction to the popularity of sans serif styles, some printers devised 
a variant of the serif style in which the serifs were of a similar thickness to the 
letters’ main strokes. These were used in wood blocks from 1810 and in metal type 
from 1817. Somewhat confusingly, these typefaces were referred to as “antique” in 
the United Kingdom and the United States. However, they were also occasionally 
referred to as “Egyptian”, and the equivalent terms were adopted by typographers in 
France and Germany. In English-speaking countries, they are nowadays referred to 
as “slab serif” typefaces (Mosley, 1999, pp. 42, 56; 2007). Figure 1.4 shows a typical 
example of a Snellen chart, used to assess visual acuity (to be discussed in Sect. 4.3). 
This uses slab serif symbols, each constructed within a 5 x 5 grid. 

From the 1830s, sans serif typefaces were widely used in commercial printing 
(Mosley, 1999, p. 43). Initially, they were mainly used for display purposes (Kinross, 
1992, pp. 28-29; McLean, 1980, p. 64). Nowadays, as Perea (2013) noted, sans serif 
typefaces are widely used in many countries for public direction signs, although it 
caused some controversy when they were introduced for a new motorway (freeway) 
system in the United Kingdom in 1959 (see Lund, 1999, pp. 126-147). Some 
foundries in the United Kingdom adopted the term “gothic” for sans serif typefaces 
around the middle of the nineteenth century, and this term became generally used in 
the United States (Mosley, 1999, pp. 55-56). There was much global interchange in 
the evolution of typography, and the distinction between serif and sans serif typefaces 
appears to be universal in countries that have adopted a Western alphabet (Ovink, 
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1 20/200 
2 20/100 
3 20/70 
4 20/50 
5 20/40 
6 20/30 


FELOPZD 7 20/25 


DEFPOTEC 8 20/20 

SSS 

LEFODPCT 9 
FDPLTCEO 10 
PEZOLCFEFTD 11 


Fig. 1.4 A typical Snellen-type chart, used to evaluate visual acuity. Licenced under the Creative 
Commons attribution-Share Alike 3.0 Unported (CC BY-SA 3.0), https://commons.wikimedia.org/ 
w/index.php?curid=4262200 


1938, pp. 188-228). More recently, this has been compounded by the hegemony of 
word-processing software originating in the United States. 

The distinction between serif and sans serif typefaces applies to most typefaces 
that are designed for reading over long stretches of text. It is less applicable to other 
kinds of typeface, such as display typefaces that are designed to attract attention 
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(for instance, in advertisements or logos) and cursive typefaces that are intended to 
mimic handwriting. 


1.4 Review Methodology 


This book reports the findings of a systematic review of research comparing the 
legibility of serif typefaces and sans serif typefaces. As Uman (2011, p. 57) explained, 
“narrative reviews” (in other words, conventional literature reviews) 

can often involve an element of selection bias. They can also be confusing at times, partic- 

ularly if similar studies have diverging results and conclusions. Systematic reviews, as the 

name implies, typically involve a detailed and comprehensive plan and search strategy 
derived a priori, with the goal of reducing bias by identifying, appraising, and synthesizing 

all relevant studies on a particular topic. 

Readers may be aware of the systematic reviews from the Cochrane Collaboration, 
an international organisation created in 1993 to focus on health-related issues. A 
complementary organisation, the Campbell Collaboration, was established in 2000 
to focus on social-related issues (see Noonan & Bjørndal, 2010). 

The first stage in any systematic review is to identify a search strategy based upon 
key terms. This can be simple or complex and in clinical research can involve the 
specification of inclusion or exclusion criteria. In this case, the aim was to identify all 
previous studies which had endeavoured to compare the legibility of serif and sans 
serif typefaces. It was therefore decided to use the single key term serif. “Legibility” 
is intrinsically a psychological concept with educational applications, but it was 
recognised that the legibility of serif and sans serif typefaces might vary across 
particular clinical populations. Accordingly, the following online databases were 
deemed relevant: 


e APA Psyclnfo  (https://www.apa.org/pubs/databases/psycinfo; formerly 
PsycINFO) contains approximately 5 million records, mainly relating to 
peer-reviewed publications. It subsumes the journal Psychological Abstracts, 
which went back to 1894, but it also contains some earlier publications. 

e ERIC (https://eric.ed.gov/) is the bibliographic database of the US Education 
Resources Information Center. It contains more than 1.6 million records of 
education-related materials. The collection was initiated in 1966, although it 
contains some earlier material. In the past, authors could use it as a repository 
for their own material, and so a proportion of the records relates to “grey” liter- 
ature that has not been peer-reviewed. However, with effect from January 2016, 
ERIC introduced a selection policy that limited new records to material that had 
undergone some kind of review process. 

e MEDLINE (https://www.nlm.nih.gov/medline/index.html) is the bibliographic 
database of the US National Library of Medicine. It contains more than 27 million 
records relating to journal articles in the life sciences with a focus on biomedicine. 


All three databases are accessible through EBSCO Information Services, which 
means that they can be searched simultaneously to avoid duplicate results. 
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These three databases were searched repeatedly during the period 2019-2021 to 
find publications containing the term serif in their titles, abstracts, keywords, or 
metadata. This led to a high number of false positives mainly because “Serif” is a 
common first name and family name in Turkey. It was also suspected to lead to a 
high number of misses, because informally it was noted that some relevant sources 
were not covered by any of the three databases. The results were therefore used as a 
basis for the conventional procedures of backward searching and forward searching. 
The former refers to the examination of previously published sources cited by the 
obtained hits, while the latter refers to the examination of subsequently published 
sources that cite one or more of the obtained hits. This process was facilitated by 
employing the database Web of Science (https://clarivate.com/products/web-of-sci 
ence/, formerly Web of Knowledge), which enables searching among 79 million 
cited or citing sources in the form of books, journals, and conference proceedings. 

When there is sufficient commonality with regard to research methods, it is 
possible to integrate the quantitative findings of a systematic review using statis- 
tical techniques, thus yielding a single overall estimate of a difference, variation, or 
effect size. Such an approach is known as “meta-analysis”. A classic example is the 
analysis of differences between men and women in their performance on particular 
cognitive tests or similar tasks (for example, see Caplan et al., 1997). Nevertheless, 
despite the apparently simple nature of the research question, the present literature 
review yielded extremely diverse methods of data collection and analysis, and this 
ruled out any formal statistical meta-analysis to integrate the research findings from 
the wide variety of studies that will be described. 

The alternative approach is to rely on “vote counting” (sometimes known as the 
“box score” approach). For present purposes, different studies are sorted according to 
whether their results are statistically significant favouring serif typefaces, statistically 
significant favouring sans serif typefaces, or not statistically significant, and I shall 
invite readers to plump for the majority outcome across these three categories in 
particular tasks, in particular contexts, and in particular subject populations; in other 
words, I shall focus readers’ attention on the most common finding or, as statisticians 
would say, the modal finding. Such an approach is not without its hazards (see Caplan, 
1979; Maccoby & Jacklin, 1974, pp. 355-356), and it can in theory lead to misleading 
conclusions (see Hedges & Olkin, 1985, pp. 48-52). Nevertheless, it has the major 
advantage over the use of narrative review that the criteria and standards used for 
the selection and interpretation of individual studies have been made totally explicit, 
and hence the findings can be readily replicated. In fact, in most cases the results 
are sufficiently unambiguous that readers should have little difficulty sharing my 
conclusions. 

The final methodological point is that there is no reason to think that typographical 
features have the same consequences when people are reading from paper or other 
hard surfaces and when they are reading from computer monitors or other screens. 
It follows that reviews which indiscriminately combine research on reading from 
paper with research on reading from screens (e.g., Chung, 2020) are unlikely to be 
informative. Accordingly, Part I of this book reviews the research literature regarding 
the legibility of serif typefaces and sans serif typefaces when they are used to generate 
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material that is printed on paper, and Part II reviews the research literature regarding 
the legibility of serif typefaces and sans serif typefaces when they are used to produce 
material that is to be viewed on display screens or by means of other kinds of 
technology. 


1.5 Conclusions 


This chapter has introduced the distinction between serif and sans serif typefaces. 
There seem to be widespread assumptions about their relative legibility both on paper 
and on screens. The chapter also described the methodology of systematic review 
employed to address this issue. 
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Chapter 2 A) 
Concepts and Research Methods rie 


2.1 Concepts 


Since the introduction of movable type in Western countries during the fifteenth 
century, many thousands of different typefaces have been designed for use in printed 
material. A typeface can be expressed in several different fonts (bold, italic, etc.) by 
varying the weight, width, and style of individual characters. Since the seventeenth 
century, there has been an alternative use of font (and its variant fount) as a synonym 
for typeface, and this has become more common since the introduction of digital 
typography (Oxford University Press, n.d.). Nevertheless, for consistency, the words 
typeface and font will be used in their original senses in this book; thus, a typeface is 
comprised of a family of related fonts. In Sect. 1.2, for example, Fig. 1.1 showed eight 
different typefaces, and both the name of each typeface and the example sentence 
(the pangram) were shown in the typeface’s regular font. 

Typefaces are designed to be read, and thus an obvious research question is 
whether different typefaces vary in how legible they are for readers. Readable can 
be used as a synonym of legible, although there are technical definitions of both 
legibility and readability that go beyond their daily use. Some researchers have used 
“legibility” to refer to the recognition and the identification of individual letters or 
words and “readability” to refer to the reading and the understanding of connected 
prose. Others have devised “readability formulas” to measure the level of mental 
difficulty involved in reading specific material. Yet others have used “readability” 
to refer to the extent to which a typeface is subjectively appealing or comfortable 
to the reader. Even so, as Chomsky (1970) pointed out, in many current versions of 
English, readable is much more sharply restricted in meaning than “able to be read”: 
it is instead often used to mean how easy, enjoyable, or engaging a work is to read, 
as in “This is a most readable novel”. (Chomsky explained that this phenomenon 
was problematic for theories of transformational grammar.) Consequently, legible 
and legibility will be used throughout this book. 

The legibility of typefaces is pertinent to a wide variety of everyday settings, but 
it is particularly relevant for the field of education. First, much of the information 


© The Author(s) 2022 11 
J. T. E. Richardson, The Legibility of Serif and Sans Serif Typefaces, 
SpringerBriefs in Education, https://doi.org/10.1007/978-3-030-90984-0_2 


12 2 Concepts and Research Methods 


that is acquired by students is delivered in books, articles, or other printed documents 
presented either on paper or computer screens or in printed displays projected using 
PowerPoint or other software. Second, students often submit their work to be evalu- 
ated by their teachers or other assessors in the form of word-processed documents, 
which raises the issue of their legibility for those teachers and assessors, and which 
led to the question that gave rise to this project. 


2.2 Objective Methods for Measuring the Legibility 
of Typefaces 


Attempts to measure the legibility of printed material go back at least to the 1880s. 
Tinker (1963, pp. 5-7, 9-31) provided a useful summary of the relevant methods of 
investigation (see also Pyke, 1926, pp. 25-34; Reynolds, 1979; Zachrisson, 1965, 
pp. 44-69). The following list of methods is a paraphrase based mainly upon Tinker’s 
account, but it covers most of the techniques that have been used to measure the 
legibility of printed material. Most of them can be applied equally to measure the 
legibility of material presented on computer monitors or other screens. 


e Short-exposure method. Printed symbols are briefly presented (e.g., by means 
of a tachistoscope, which carefully controls the duration of a presentation using 
shutters and mirrors) to measure the speed or accuracy with which they can be 
perceived and reported. 

e Distance method. Printed symbols are presented in clear view but at a distance 
from the observer. The material is then moved towards the observer in gradual 
steps to measure the furthest distance at which they can be perceived and reported 
correctly. A variant is where the observer gradually approaches the stimulus. 
Similar techniques to compare the legibility of different typefaces have been 
employed since the eighteenth century (Kinross, 1992, pp. 23-24). 

e Perceptibility in peripheral vision. Printed symbols are presented to one side 
or the other of a central fixation point to measure the furthest horizontal distance 
at which they can be perceived and reported correctly. Similar effects can be 
obtained using the “focal variator” (Weiss, 1917), which uses a system of lenses 
to project a visual stimulus onto a ground glass screen to varying degrees out of 
focus. 

e Visibility threshold. Printed symbols are viewed through two photographic filters 
with precise circular gradients of density which are rotated until the material can 
be perceived and reported correctly. The filters reduce the apparent brightness of 
the material and also lower the contrast between the material and its background 
(Luckiesh & Moss, 1935, 1942, pp. 71-79). 

e Reflex blink method. The observer reads text, and the experimenter counts the 
number of involuntary eye blinks made during a standard observation interval. 
This assumes that the blink rate is reduced and the reader’s progress faster with 
more legible text. 
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e Rate of work. This covers a variety of tasks including speed of reading, amount 
read in a specified time limit, the time taken to look up specific information in 
printed sources such as telephone numbers or functions in mathematical tables, 
and output in tasks involving visual discrimination. 

e Eye movements. The observer is asked to read continuous text, and the experi- 
menter measures the number of fixations and the number of jumps or saccades 
between successive fixations. This assumes that more legible text results in shorter 
fixations and fewer saccades. 

e Fatigue in reading. This approach is concerned not with visual fatigue in reading 
per se, which has proved consistently difficult to measure; rather, legibility is 
defined in terms of the ease, accuracy, or efficiency of perceiving printed symbols 
while reading for understanding. 


Of course, new technologies to measure legibility have been introduced over 
the years. For instance, to employ the short-exposure method, Cattell (1885) used 
a “gravity chronometer” in which printed material was obscured by a vertically 
sliding panel held in place by an electromagnet. On its release, the panel fell, and the 
material was visible for a brief period of time through a small window in the panel. 
Cattell found that both uppercase letters and lowercase letters varied considerably 
in their legibility. More sophisticated tachistoscopes became available in the early 
twentieth century. Since the 1970s, technology has included cathode-ray tubes and 
liquid crystal displays, and these will be discussed in Part II. Again, studies of eye 
movements in reading have become more popular with the use of computer-based 
eye-tracking devices, and these will also be discussed in Part II. 

Not only have researchers adopted different methods for measuring the legibility 
of printed material, but they have presented their participants with different kinds 
of material: individual letters or other characters; letter strings that do not constitute 
words; individual words; sequences of unrelated words; strings of words that consti- 
tute grammatical sentences; or coherent grammatical prose. The materials towards the 
beginning of this list afford more opportunity to control the participants’ behaviour, 
whereas the materials towards the end of the list are more akin to those encountered 
in everyday reading situations. As in other kinds of educational and psychological 
research, there is a trade-off between experimental rigour and “ecological validity” 
(i.e., whether the findings can be generalised to real-life settings). 

In particular, Kinross (1992, p. 32) noted that most of the research carried out 
before the end of the nineteenth century had tested the recognisability of individual 
letters rather than the legibility of words or passages of text. As he pointed out, it took 
a change in the theoretical climate around 1900 for legibility to be interpreted as the 
comprehension of meaning: “not recognition, but reading.” Tinker (1963) went so 
far as to propose the following conclusion: “Research dealing with individual letters 
or letters grouped in nonsense arrangement offers little that is important concerning 
the legibility of type faces. Satisfactory results are obtained by measuring speed of 
reading continuous, meaningful material” (pp. 65—66). 

It is important to distinguish between the legibility of different typefaces and 
readers’ familiarity with these typefaces. This is reflected in a study of binocular 
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rivalry by Zachrisson (1965, pp. 128-131). The latter phenomenon occurs when a 
stimulus is presented to one eye but another stimulus is presented to the other eye: 
instead of seeing the stimuli superimposed on each other, most observers report 
seeing images of the stimuli alternating with each other. Zachrisson presented 28 
students with individual words. In each case, the word was presented in the serif 
typeface Imprint to one eye and in the sans serif typeface semi-bold Grotesk to 
the other eye. The participants pressed one of two buttons to report which typeface 
they were seeing over a period of 3 min. The results showed a strong dominance 
of the serif typeface over the sans serif typeface, regardless of the eyes to which 
they were presented. Zachrisson repeated his study with 9-year-old children and 
found that the dominance of the serif typeface was much weaker. He took this to 
reflect the children’s reduced familiarity with these letter forms. The implication 
is that the stronger ocular preference seen in adults was mainly due to their more 
extensive experience of reading documents (such as newspapers, magazines, reports, 
and books) in serif typefaces rather than to any inherent superiority in their legibility. 


2.3 Subjective Methods for Measuring the Legibility 
of Typefaces 


Researchers have also collected subjective reports from their participants concerning 
the legibility of different typefaces. Examples of different typefaces can be presented 
either individually or in groups of two or more for comparison. The self-reports can 
be collected either informally (for example, through interviews) or more formally 
(for example, through the use of rankings or rating scales). Pyke (1926, pp. 58-59) 
asked 60 participants to rank order eight typefaces (including one sans serif typeface, 
Lining Grotesque) in terms of their “relative merits”. He found that the participants 
gave various reasons for their choices, and many found it difficult to differentiate 
between the typefaces on this basis. He found that there was a correlation of just 
+0.46 with the rank order of performance in a speed-of-reading test, and he concluded 
that the relationship between the two measures was unclear. 

Tinker and Paterson (1942) asked a group of participants to arrange samples of 
ten different typefaces “in order from most legible to least legible” (p. 38). Tinker 
(1944) then obtained results on the legibility of the ten typefaces using the visibility 
threshold method, which he referred to as their “visibility”. He had existing data on 
the legibility of the same typefaces using the distance method (which he referred to as 
their “perceptibility”) as well as data on their legibility using their speed of reading. 
He found a high positive correlation between the ranks of their visibility and their 
perceptibility, which suggested that the two measurements had much in common. 
Nevertheless, their ranked visibility and perceptibility both showed a modest negative 
relationship with their ranked speed of reading. In addition, their judged legibility 
showed a high positive correlation with their ranked visibility and perceptibility. 
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However, consistent with the results obtained by Pyke (1926), their judged legibility 
showed only a correlation of +0.33 with their ranked speed of reading. 

These findings could be interpreted in a number of different ways. Tinker himself 
argued that both visibility and perceptibility at a distance represented abnormal and 
artificial reading situations, even though they were highly correlated with judged 
legibility. Instead, he argued that speed of reading constituted the best possibility 
as a measure of legibility since it provided “measurement in a normal, ordinary 
reading situation” (Tinker, 1944, pp. 393-394). In addition, he claimed that subjective 
judgments of legibility “can only be considered as an expression of preference which 
may be employed to advantage in a practical way for the guidance of printers when 
there is a choice to be made between equally readable typographical arrangements” 
(pp. 394-395). Even so, Tinker’s results imply that the techniques listed in Sect. 2.2 
do not simply constitute alternative measures of a single construct of legibility. Any 
research findings with regard to the legibility of serif and sans serif typefaces therefore 
need to be qualified by a clear explanation of the measure or measures of legibility 
on which they are based, and this practice will be adopted in this book. 

Participants’ preferences might not be a reliable indicator of the objective legibility 
of different typefaces, but they may well have practical consequences. Song and 
Schwarz (2008b) carried out three studies in which the participants read instructions 
for carrying out a particular task printed either in a plain sans serif typeface (Arial in all 
three studies) or in an elaborate cursive typeface (Brush455 BT or Mistral in different 
studies). In all three experiments, the sans serif typeface was rated as easier to read 
than the cursive typeface, but there was no difference in the participants’ memory 
for particular details in the instructions. The participants who read the instructions 
in the cursive typeface reported that the task would take more time, would feel less 
fluent and natural, and would require more skill, and that they were less willing 
to engage in the task than were the participants who read the instructions in the 
sans serif typeface. Song and Schwarz concluded that the participants had mistaken 
the ease of processing the instructions as indicating the ease with which the relevant 
tasks could themselves be executed. Song and Schwarz (2008a) showed that the same 
manipulation affected how participants answered distorted and undistorted questions 
based on their general knowledge. 


2.4 The Size of Typefaces 


It might seem plausible that the legibility of different typefaces depends on their size. 
In fact, Legge and Bigelow (2011) showed that legibility was essentially constant 
across the range of type sizes that readers might encounter in books, magazines, and 
newspapers. Nevertheless, comparing the physical size of different typefaces is not 
a straightforward matter. 

Traditionally, the overall height of typefaces (technically known as their body size) 
has been expressed in terms of points, where one point is approximately equal to 
0.35 mm. However, the size of typefaces is also expressed in terms of the dimensions 
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Fig. 2.1 Key concepts in the measurement of typefaces. From Effects of printing types and formats 
on the comprehension of scientific journals (Applied Psychology Unit Report No. 346), by E. C. 
Poulton, 1959. UK Medical Research Council, Applied Psychology Unit. Used by kind permission 
of the Medical Research Council, as part of UK Research and Innovation 


of lowercase letters. Some lowercase letters have features that extend above their 
main parts (e.g., b and d); these are called ascenders. Others have features that extend 
below their main parts (e.g., p and q); these are called descenders. The x-height of 
a typeface is the height of lowercase letters that do not have either ascenders or 
descenders (such as the letter x itself). Finally, the cap-height of a typeface is the 
height of capital (or uppercase) letters, which may or may not be the same as the 
height of ascenders. Key concepts in the measurement of typefaces are summarised 
in Fig. 2.1. 

The body size of a typeface is thus made up of its x-height and the combined 
heights of the ascenders and descenders, plus small margins above the tops of 
ascenders and below the bottoms of descenders. (The latter are known as leading, 
pronounced “ledding”’. This term originated in the practice of using thin strips of 
lead to separate lines of text in order to increase the vertical space between them. 
Such terminology was developed in the age of movable type, but it has been carried 
over into electronic printing, where it is also known as interline spacing.) When 
comparing different styles of typeface, researchers have often matched them on the 
basis of their body size. Nevertheless, Poulton (1972) noted that this does not equate 
the sizes of the individual letters, as measured by their x-height. In general, it is often 
not possible to match pairs of typefaces simultaneously on the basis of both their 
body size and their x-height. 

Poulton (1972) simulated the situation of a shopper looking for a particular item 
in the list of ingredients on a package of food. Because food containers are often quite 
small, the typefaces used for lists of ingredients are themselves usually quite small. 
Poulton tried to determine the minimum legible size of lowercase letters printed in 
one of two serif typefaces (Times New Roman and Perpetua) or in one sans serif 
typeface (Univers). He asked a total of 264 adult volunteers to find a designated 
target word within each of 15 lists of food ingredients, for which they were allowed 
25 s. The number of target words found within this time limit was used as a measure 
of legibility. 
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Poulton found that performance markedly declined when the x-height of a typeface 
was less than 1.2 mm. He also found that Times New Roman and Perpetua yielded 
similar results, even though the latter’s body size was more than 30% greater than 
that of the former. He inferred that body size was not an important determinant 
of legibility. Performance was significantly poorer with Univers than with either 
Times New Roman or Perpetua, but not when the x-height of the two latter typefaces 
was reduced photographically to match that of the former. Poulton concluded that 
Univers was less legible than the other typefaces because of its smaller x-height and 
not because of the absence of serifs. In fact, typographers have believed for a long 
time that the visual impact of lowercase letters is determined by their x-height rather 
than by their body size (Craig, 1971, p. 24; Williamson, 1966, p. 37). 


2.5 Conclusions 


This chapter clarified the distinction between typefaces and fonts and that between 
legibility and readability. It described the variety of objective methods that have been 
used to measure the legibility of printed material and the different ways of collecting 
subjective reports from participants regarding the legibility and other properties of 
presented material. Most of these techniques have been taken over into research on 
reading from screens. Finally, this chapter described how typographers define the 
size of typefaces and discussed which aspects of the size of typefaces are likely to 
affect the legibility of material. 
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Reading from Paper 


Chapter 3 R) 
“Everybody Knows”: Reading get 
from Paper 


3.1 Attitudes of Typographers 


Section 1.1 noted the uncritical acceptance of the view that serif typefaces were 
easier to read on paper and other hard surfaces than were sans serif typefaces. This 
view has traditionally been adopted by typographers and typography educators (see, 
e.g., Craig, 1971, pp. 123-125; Hamai, 1986, p. 7; McLean, 1980, p. 44; Williamson, 
1966, p. 109). Such authors have often relied on their attitudes and experience (and, 
sometimes, their authority and influence), but they show little awareness that the 
issue might be subjected to formal empirical research. The most extreme position 
was adopted by Morison (1959), who had developed the serif typeface Times New 
Roman in 1932 for the London newspaper, The Times (see Sect. 1.2); Morison insisted 
that “the serif is essential to the reading of alphabetical composition in long passages 
and consecutive pages” (p. xi), but he provided no evidence for this assertion. In 
short, as far as 20th-century typography was concerned, it was definitely a matter of 
“everybody knew” that serif typefaces were easier to read than sans serif typefaces 
when printed on paper. 

These views have often been uncritically incorporated into the guidelines for 
potential authors that have been produced by journal editors and publishers. Such 
guidelines tend to include assumptions about the legibility of serif and sans serif 
typefaces, typically without providing any arguments or evidence in support of 
those assumptions. One characteristic example can be found in the sixth (2010) 
edition of the Publication Manual of the American Psychological Association. In a 
section headed “Preparing the Manuscript for Submission,” readers were instructed 
as follows: 


A serif typeface . . . is preferred for text because it improves readability and reduces eye 
fatigue. (A sans serif type may be used in figures, however, to provide a clean and simple 
line that enhances the visual presentation.) (American Psychological Association, 2010, 
pp. 228-229). 


This is contrary to good practice in psychology and the social sciences, where it 
is nowadays expected that assertions of this kind will be supported by citations of 
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published (and, ideally, peer-reviewed) research, not based on presumed authority 
or ex cathedra pronouncements. Even so, the Association’s guidelines were adopted 
by the American Educational Research Association and by many other organisations 
both in the United States and around the world. 

Another example can be found in Merriam-Webster’s Manual for Writers and 
Editors (1998), although this did at least allude to the existence of empirical evidence 
on the issue: 


Serif faces are somewhat easier to read in blocks or paragraphs of text than sans-serif faces. . 
. . Studies of typeface legibility have tended to demonstrate that standard serif typefaces can 
be read somewhat more easily and quickly than standard sans-serif typefaces. (pp. 329-330). 


However, since no such studies were explicitly cited, it is impossible for an interested 
but sceptical reader to determine whether or not the Manual’s account was accurate. 


3.2 Dissenting Voices 


During the twentieth century, there were few dissenting voices from this dominant 
view within typography. Although he had been one of Morison’s colleagues, Dreyfus 
(1985, p. 19) stated: “The outcome of many experiments indicates there is no statis- 
tically significant difference between the legibility of a wide variety of text types, 
even between seriffed and unseriffed types.” Unfortunately, he failed to specify which 
“experiments” he had in mind. 

In the twenty-first century, dissenting voices have tended to come from the editors 
of journals in medicine and bioscience. They would, of course, be comfortable with 
the idea that the legibility of different typefaces could be the subject of empirical 
research, but their comments suggest that they typically lacked specialised knowledge 
of this research literature. In 2004, for example, the Journal of Psychopharmacology 
moved from a serif typeface to a sans serif typeface, which the editor claimed (once 
again without citing any sources) “should improve visual impact and reading ease 
of the Journal” (Nutt, 2004, p. 5). 

In 2011, the editor of the Spanish journal Revista Espanola de Anestesiologia y 
Reanimación, notified its readers of a “new look” for the journal (Errando Oyonarte, 
201 1a). (It specialises mainly in anaesthesiology, resuscitation, and pain manage- 
ment.) He mentioned a number of changes in the appearance and the format of the 
journal with the aim of making it more pleasant to read. It is interesting that the 
editor did not claim that these changes would necessarily render its contents more 
legible, simply that they would make its contents more attractive to its target read- 
ership. Among other changes, the journal had employed a different typeface, but in 
his initial announcement the editor did not specify the typeface in question. 

One reader, Gonzdlez-Rodriguez (2011), pointed out that the journal had adopted 
a palo seco or sans serif typeface for both the headings of articles and their text. (The 
expression palo seco literally means “dry stick” in Spanish. It is used as a technical 
phrase by Spanish typographers, but there appears to be no counterpart expression 
to refer to serif typefaces. Some authors use tipografía con remates and tipografía 
sin remates—literally, with and without finishing—and others use tipografia con 
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adornos and tipografia sin adornos—literally, with and without ornamentation— 
while others simply borrow the foreign terms serif and sans serif.) The reader 
complained that the exclusive use of palo seco violated the general custom in Spanish 
typography of using sans serif typefaces for headings but serif typefaces for the body 
text of articles. Specifically, he claimed that adopting a serif typeface facilitated a 
reader’s eye movements in following a line of text. (This idea was discussed but 
dismissed in Sect. 1.2.) In contrast, he claimed that the adoption of a sans serif type- 
face meant that readers’ eye movements were in danger of being lost in a “river” of 
white spaces. 

Gonzalez-Rodriguez allowed two exceptions to this generalisation. One involved 
the preparation of texts for younger readers; the other was the situation of reading 
written texts on computer screens. In both cases, he argued, sans serif typefaces 
provided a clearer image. He cited one study with regard to younger readers, who 
will be considered in Chap. 6 of this book; nevertheless, his claim is clearly not 
relevant to the task of experienced clinicians reading articles in a specialist journal. 
He cited no empirical evidence with regard to reading from computer screens, and 
so this is just another example of “everyone knows” discussed in Sect. 1.1. The issue 
of reading from screens will be the focus of Part II of this book. 

In his response, the editor identified the sans serif typeface that the journal was 
now using as Akzidenz-Grotesk (Errando Oyonarte, 2011b). As he pointed out, 
this was devised as long ago as 1898 and had since been used by a wide variety 
of organisations and agencies; hence, it was by no means a novel and untested 
intrusion into the publishing world. Even so, he acknowledged that the panel which 
had recommended the “new look” for the journal had not included any experts on 
typography. He also argued that the journal should be open to the possibility that 
readers would increasing tend to access articles online rather than printed on paper. 
On Gonzalez-Rodriguez’s (2011) account, therefore, the adoption of a sans serif 
typeface should actually enhance the journal’s legibility for those readers. The editor 
agreed to keep the situation under review in the future, although in fact at the time of 
writing the journal still uses the same sans serif typeface throughout. Indeed, since 
2013 the journal has published articles in both Spanish and English using the same 
appearance, the same format, and the same sans serif typeface. 

There is one example of a journal that has flip-flopped on this matter. In 2009, 
the journal Brain moved from a serif typeface to a sans serif typeface, but in 2015 it 
moved back to a serif typeface; on the latter occasion, the editor made the comment: 
“Whether serif or sans-serif typefaces are more readable has been addressed sporad- 
ically with psychophysical studies, without a clear conclusion” (Kullmann, 2015, 
p. 1). Whether this statement provides an accurate account of the relevant research 
literature is the focus of Part I of this book. 
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3.3 Are Serifs Purely Decorative? 


One simple idea can be rejected at the outset. This is the assumption that serifs are 
purely decorative and superfluous to the task of identifying individual letters (e.g., 
Arditi & Cho, 2005; Burt, 1959, p. 8). There is good experimental evidence that 
readers identify the specific typefaces that they are reading before they identify the 
individual letters or words presented in those typefaces. For instance, words take 
longer to identify if their constituent letters are printed in different typefaces than if 
they are printed in one single typeface. This is true, in particular, if the letters are 
printed in both serif and sans serif typefaces (Adams, 1979; Krulee & Novy, 1986; 
Sanocki, 1987, 1988). Neurophysiological research indicates that typeface-specific 
information is processed within the right hemisphere of the brain, whereas typeface- 
independent information is then processed within the left hemisphere of the brain 
(Schweinberger et al., 2006; Vaidya et al., 1998). 

Weaver (2014) described the case of a 52-year-old woman with a history of 
complex epileptic seizures that in recent years had been triggered specifically by 
reading. She herself had observed that her seizures were associated with reading 
material printed in serif typefaces (such as Palatino or Times New Roman) but not 
with reading material printed in sans serif typefaces (such as Arial or Verdana), 
although she also had intermittent spontaneous seizures. Her observation was 
confirmed electrophysiologically by asking her to read the first three pages of Charles 
Dickens’ novel, A Tale of Two Cities, typed in Times New Roman or Arial typefaces. 
Weaver suggested that the presence of serifs constituted more complex visual infor- 
mation that led to the activation of the hyperexcitable neuronal network responsible 
for her seizures. 

The neurological condition of synaesthesia, in which perceiving an object in one 
mode stimulates the perception of a quite different mode, provides another example. 
A common form is grapheme-—colour synaesthesia, in which specific characters, 
while printed in black, are seen as coloured. Weaver and Hawco (2015) described a 
patient who tended to perceive the letters // (as in silly) in a vivid blue colour. The 
effect was more vivid for words presented in serif typefaces than for words presented 
in sans serif typefaces. However, if the // was printed in a sans serif typeface but the 
rest of the word was printed in a serif typeface, the effect was more vivid than if the 
ll was printed in a serif typeface but the rest of the word was printed in a sans serif 
typeface. These findings show that serifs have a functional role that is not superfluous 
to letter recognition. This objection also applies to the idea that serifs only serve as 
visual noise (or as “cluttering incoming visual information”, as was suggested by 
Woods et al., 2005, p. 97). 
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3.4 Conclusions 


This chapter introduced Part I of this book by summarising the attitudes of 20th- 
century typographers, who almost without exception considered that serif typefaces 
were easier to read than sans serif typefaces when printed on paper. During the 
twenty-first century, any dissenting voices have mainly come from journal editors 
in medicine and bioscience, who have tended to recommend the use of sans serif 
typefaces for the contents of their journals but have not provided any supporting 
evidence. This chapter also considered but dismissed the idea that serifs are purely 
decorative and superfluous to the task of identifying individual letters. 
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Chapter 4 ®) 
The Legibility of Letters and Words creek 


4.1 Reading Letters and Words in Serif and Sans Serif 


Typefaces 


The earliest experiments on the legibility of printed material were concerned with 
the relative legibility of individual letters presented in isolation in either uppercase 
or lowercase in the same serif typeface using the short-exposure method or the 
distance method. Other research was concerned with the relative legibility of charac- 
ters presented in different serif typefaces. However, some researchers included one 
or more sans serif typefaces together with a range of serif typefaces in investigations 
of the legibility of letters and words: 


Griffing and Franz (1896) measured the “illumination threshold” of letters consti- 
tuting from one to four words in a line. In this method, the distance between a faint 
light source and the material was progressively reduced until the letters could be 
correctly reported. They included uppercase letters in both thick and thin versions 
of the sans serif typeface Block. 

Roethlein (1912) used the distance method to measure the legibility of indi- 
vidual letters presented in the same typeface and included the sans serif typefaces 
Franklin Gothic and News Gothic. 

Pyke (1926) measured the legibility of both meaningful and meaningless strings 
of letters presented in the same typeface, including the sans serif typeface Lining 
Grotesque. He used a speed-of-reading test, a letter-cancellation task in which 
participants had to cross out all occurrences of the letters e and f in a page of 
nonsense material, and a task which involved reading aloud coherent text (pp. 47— 
58). 

Paterson and Tinker (1932) used a speed-of-reading test in which the participants 
had to read short passages and in each case to identify a word that conflicted with 
the passage’s meaning. They used a number of different typefaces including the 
sans serif typeface Kabel Light. 
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e Webster and Tinker (1935) employed the distance method to measure the legibility 
of individual words in different typefaces and also included Kabel Light. 

e Luckiesh and Moss (1937) measured the visibility threshold of individual lower- 
case letters and included a sans serif typeface in light, medium, and bold font. 
They did not identify this typeface, but Lund (1999, p. 116) suggested that it was 
Kabel. 

e Luckiesh and Moss (1942, pp. 159-162) carried out a similar experiment and 
included the sans serif typeface Metrolite No. 2. 


None of these researchers focused upon the difference between serif and sans serif 
typefaces, but the results that they presented indicate that the legibility of sans serif 
typefaces was not markedly different from the legibility of serif typefaces that were 
in common use at the time. 

Ovink (1938) carried out two experiments to investigate the typographical factors 
that might influence the legibility of printed letters. His first experiment used the 
short-exposure method, and he presented individual lowercase letters in isolation 
in both serif and sans serif typefaces (pp. 23-37). The second experiment used a 
version of the distance method in which the printed material was presented in a 
fixed location and the participants approached it in gradual steps until it could be 
perceived and reported correctly. The material consisted of individual uppercase and 
lowercase letters presented in isolation in one of several sans serif typefaces or in one 
of two serif typefaces (Lo and Poster Bodoni) (pp. 38-71). The results that Ovink 
obtained using both methods show that the legibility of the sans serif typefaces was 
not markedly different from that of the serif typefaces. 

Korean is another language where the alphabet can be rendered in either serif 
(or Ming) typefaces or sans serif (or Gothic) typefaces. Two studies have compared 
the legibility of the two kinds of typeface, but they yielded contradictory findings. 
Kong et al. (2011) asked ten older and ten younger adults to read aloud sets of four 
one- or two-syllable letters of varying sizes and then to rate how much discomfort 
they had experienced when reading each set on a 4-point scale. The sets of letters 
were presented either on paper or on a computer screen using either an unspecified 
Ming typeface or an unspecified Gothic typeface. When the letters were presented 
on paper, the participants’ reading speed was faster and their discomfort was less 
with the sans serif typefaces than with the serif typefaces. Nevertheless, the relevant 
differences were small in magnitude and unlikely to be of practical importance. The 
results were similar in both age groups. 

Kim et al. (2015) presented 14 Korean students with pairs of two-syllable words 
printed side by side in 26-point type. On each trial, one member of the pair had been 
designated as the target, while the other was a distractor, and the participants’ task 
was to read aloud the target in each pair. The pairs of letters were presented either 
on paper or on the screen of a smartphone in one of two serif typefaces (Batung 
or Gungseo) or in one of two sans serif typefaces (Dodum or Gulim). Afterwards, 
the students were asked to rate the typefaces of the stimuli in terms of their ease 
of reading, their familiarity, and their comfort. When the words were presented on 
paper, the participants’ reading time was significantly faster for the serif typefaces 
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than for the sans serif typefaces. Despite the pattern of results for their response 
times, they gave the highest ratings to the sans serif typeface Dodum and the lowest 
ratings to the serif typeface Gungseo. 

Wilkins et al. (1996) had devised the Rate of Reading Test for children with reading 
difficulties. The child was presented with a display of 10 lines, each consisting of a 
random ordering of the same 15 common words. For instance, the first line of one 
such display was “come see the play look up is cat not my and dog for you to”, 
except that the spacing between successive words was only 0.36 mm. This made the 
display resemble horizontal stripes and thus rendered it visually stressful. (Horizontal 
stripes are known to induce eye strain, visual illusions, headache, and—in people with 
photosensitive epilepsy—seizures: A. Wilkins et al., 1984.) The researchers argued 
that the use of random ordering minimised the linguistic and semantic aspects of 
reading that tended to be emphasised in more conventional reading tests. Children 
were timed while they read aloud the words in each display and were scored on the 
number of words that they had read correctly per minute. The original version of the 
Rate of Reading Test only used the serif typeface Times. However, Svensson (2019) 
developed a Swedish version of the test and administered it to 45 adults aged between 
22 and 83 years. The test was administered twice in Times New Roman and twice 
in Times Sans Serif, a sans serif variant designed by Mundo da Lua. The average 
reading speed was 168 words/min in both conditions, and the difference between 
them was not statistically significant (p = 0.54). 


4.2 The “Stripiness” of Printed Words 


Wilkins et al. (2007) suggested that the legibility of letters or words might depend 
upon their shape and, in particular, upon the extent to which letters’ vertical strokes 
were relatively evenly spaced, a phenomenon that typographers refer to as their 
rhythm but which Wilkins et al. referred to less formally as their “‘stripiness”’ (i.e., 
the extent to which an image of a word approximated a pattern of vertical stripes). 
They suggested that this could be measured by the height of the first peak of the 
autocorrelation between an image of a word and a second, horizontally displaced 
image of the same word. They explained this measure by asking readers to imagine 
two identical transparencies containing a single word placed on top of one another 
on an overhead projector. 


When the transparencies are in register [i.e., exactly in line], a maximum amount of light will 
be transmitted through the combined transparencies. . . . If the top transparency is moved 
horizontally across the bottom transparency, the amount of light transmitted is initially 
reduced because the letter strokes in one version of the word block the spaces in the other 
version. As the displacement continues, however, and neighbouring letter strokes come into 
register, so the amount of light transmitted increases. As the top transparency is displaced 
still further, the amount of light transmitted once again decreases and then increases again. 
The light transmitted varies with horizontal position according to a function with peaks and 
troughs. This function is, in effect, the horizontal autocorrelation. (pp. 1788-1789). 
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As examples, Wilkins et al. gave the words “mum” and “over”. (In academic texts, 
these words would normally be rendered in an italic font. On this occasion, I have 
presented the words in a regular font with inverted commas to help readers to appre- 
ciate the differences in the words’ shape.) The former has fairly evenly spaced vertical 
strokes, high periodicity, and a relatively high first peak (high stripiness), but the latter 
has very few vertical line elements, low periodicity, and a relatively low first peak 
(low stripiness). 

Wilkins et al. asked ten students to rate each of 40 common words printed in the 
serif typeface Times New Roman in terms of their stripiness on a scale from 0 (not 
at all stripy) to 10 (very stripy). They found that their mean rating for each word was 
highly correlated with the first peak in its horizontal autocorrelation (r = 0.688). 
In short, “words with a high first peak in the autocorrelation were rated as having a 
striped appearance” (p. 1791). Wilkins et al. then asked 32 university students and 
staff to read aloud 22 common monosyllabic words. The words were divided into 
those with high and low first peaks and were printed either in a single column or 
as a random paragraph of 18 lines in either Times New Roman or the sans serif 
typeface Arial. There was a large effect of autocorrelation, such that words with a 
high first peak were read more slowly than were words with a low first peak. Wilkins 
et al. confirmed this finding in two experiments using words with no ascenders or 
descenders that were printed in either Times New Roman or the sans serif typeface 
Geneva. (This indicated that the difference in reading speed was not due to the 
presence or absence of ascenders or descenders.) They also confirmed this finding in 
two experiments where participants silently scanned passages of randomly ordered 
words with the aim of finding pairs of target words. 

In addition, Wilkins et al. compared the first peak in the horizontal autocorrelation 
of 1,000 words printed in different typefaces of similar x-heights. The value of the 
first peak in the Times New Roman was very highly correlated with its value in the 
serif typeface Palatino (r = 0.95), but it was less highly correlated with its value in 
the sans serif typeface Arial (r = 0.68). The first peak tended to be highest in Times 
New Roman, somewhat less in the sans serif typeface Lucida Sans, and lowest in 
the serif typeface Palatino and the sans serif typeface Arial, although the differences 
were small in magnitude. Wilkins et al. used the same corpus of 1,000 words to 
compare the horizontal autocorrelation in the serif typefaces Times New Roman 
and Tahoma, the sans serif typefaces Arial and Verdana, and the slanting sans serif 
typeface Sassoon Primary (discussed in Sect. 7.4). They did not report the detailed 
findings, but Verdana had the lowest first peak. 

In two of their experiments, Wilkins et al. directly compared the reading times for 
different typefaces. Random paragraphs were read significantly more quickly in the 
sans serif typeface Geneva than in the serif typeface Times New Roman. However, 
there was no significant difference between the reading speeds in Times New Roman 
and the sans serif typeface Arial, regardless of whether the words were presented in 
a single column or in random paragraphs. In short: 


e Different words printed in the same typeface vary in the first peak of their 
horizontal autocorrelation (or vertical stripiness). 

e The same words printed in different typefaces vary in the first peak of their 
horizontal autocorrelation. 
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e Words of low vertical stripiness are read more quickly than are words of high 
vertical stripiness. 


However, the results obtained by Wilkins et al. leave it uncertain whether these 
phenomena lead to variations in how quickly words in different typefaces are read. 
Subsequently, Wilkins and his colleagues carried out further research using words 
presented on computer monitors, and this is described in Sect. 11.2. 


4.3 Confusions Among Letters in Serif and Sans Serif 
Typefaces 


Many early studies found that errors in the tachistoscopic recognition of individual 
letters and words were the result of confusions among visually similar letters (for 
a review, see Vernon, 1931, pp. 114-120, 145-150, 158-159). In the light of such 
evidence, Legros (1922, p. 11) claimed that serifs made letters easier to discriminate 
and identify. However, Vernon (1929) noted that other studies had found that, in 
reading connected text, words tended to be perceived in a holistic manner rather than 
letter-by-letter, and hence confusions among individual letters should be much less 
important. She presented adults with different kinds of material using a tachistoscope. 
She found that the proportion of errors based upon similarity of appearance declined 
from 82% for groups of unrelated words to 14% for longer sentences, whereas the 
proportion of errors based upon similarity of meaning increased from 2% for groups 
of unrelated words to 57% for longer sentences. 

Vernon argued that, “when the meaning of the material read was fully compre- 
hended, typographical errors were few in tachistoscopic reading, and would be negli- 
gible in normal reading” (p. 35). Elsewhere, Vernon (1931, pp. 171—172) concluded 
that young children beginning to read might be liable to confuse visually similar 
letters but that this was of much less importance in normal adults’ reading. An 
implication of this is that, even if the presence or absence of serifs influences the 
discrimination or identification of individual letters, it should have little or no impact 
upon the reading of connected text by literate adults. 

Tinker (1963, p. 36) argued that in practice serifs might serve either to enhance 
or impair the relative differentiation of individual lowercase letters. Harris (1973) 
presented individual lowercase letters tachistoscopically either to the left or to the 
right of the point of fixation in the sans serif typefaces Gill Medium and Univers 
Medium or in the serif typeface Baskerville. He found that different letters were 
more likely to be confused when presented in Baskerville. He suggested that serifs on 
letters with a single vertical stroke (such as i, j, and /) rendered them more distinctive 
and hence less likely to be confused with one another. On the other hand, he also 
suggested that serifs on letters with more than one vertical stroke (such as h, n, and u) 
rendered them less distinctive and hence more likely to be confused with one another. 
Beier and Dyson (2014) obtained analogous results using artificial typefaces with a 
version of the distance method. However, Vernon’s (1929) findings would imply that 
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such confusions would be much less likely if letters were presented in the context of 
meaningful text, as in normal reading. 


4.4 Measuring Visual Acuity 


Some of the earliest charts for measuring visual acuity were developed by Snellen 
(1862) (see Fig. 1.4 in Sect. 1.3). These contained rows of uppercase letters and 
single digits based on a 5 x 5 grid; this yielded a slab serif style akin to a typeface 
that was then known as Egyptian Paragon, in which the width of the main strokes and 
the width of the serifs were one fifth the height of a letter. (Both were the size of the 
cells in the 5 x 5 grid.) Successive rows contained increasing numbers of symbols 
of decreasing size, and visual acuity was scored according to the smallest row that 
could be read accurately in each eye. 

Over the next century, other researchers developed versions of these charts using 
different layouts and sequences of letters; some followed Snellen in using a slab serif 
style, whereas others adopted a sans serif style (see Bennett, 1965, for a review). 
Cowan (1928) asserted that “Gothic” (sans serif) letters were more easily distin- 
guished than “block” (slab serif) letters (p. 290), but he provided no reference or 
any other source for this assertion. Hetherington (1954) tested the visual acuity of 
100 boys aged 8-17 using an unspecified version of a Snellen chart; he noted that 
different letters of the same size varied in their legibility, but he concluded that the 
boys’ errors were generally the result of confusions among visually similar letters. 

In fact, since the 1950s, charts with sans serif styles of lettering have been 
widely adopted for measuring visual acuity. Examples include those devised by 
Sloan (1959), the British Standards Institute (1968), Deederer (1968, 1970), and 
Bailey and Lovie (1976). The British Standards Institute (1968) commented that 
this development in measures of visual acuity was “in keeping with modern trends 
in typography” (p. ii), while Deederer (1968) remarked that it was appropriate for 
testing drivers, since they would frequently encounter sans serif lettering on traffic 
signs. 

Richards (1965) compared the visual acuity of 103 volunteers who were tested 
using Sloan’s sans serif chart and a Snellen chart containing lettering with slab serifs; 
people who had slight uncorrected astigmatism found the letters with slab serifs more 
confusing under conditions of low luminance, but otherwise there was very little 
overall difference between the results obtained using the two styles. Richards (1978) 
subsequently replicated these findings with a sample of 175 volunteers stratified by 
age between 16-25 years and 66-75 years (that is, the sample contained 25 volunteers 
from each decade of the adult life span). Bailey and Lovie’s (1976) instrument is 
known as the LogMAR chart (the acronym stands for Logarithm of the Minimum 
Angle of Resolution), and nowadays it is generally regarded as the most accurate 
measure of visual acuity. 
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4.5 Conclusions 


The earliest research on the legibility of different typefaces was concerned with recog- 
nising individual letters and words under different conditions. The vertical “stripi- 
ness” of individual words can be defined in terms of their horizontal autocorrelation. 
This seems to affect how quickly they can be read, but it is unclear whether this leads 
to differences among typefaces. There is a separate line of research concerned with 
evaluating visual acuity, going back to the construction of optical charts in the middle 
of the nineteenth century. In both fields of research, the most common finding—the 
modal finding—is that there are no differences in the legibility of letters and words 
printed in serif and sans serif typefaces. Confusions among visually similar letters 
were originally considered to be a primary determinant of legibility, but these appear 
to be less important when skilled readers are presented with meaningful text. 
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Reading and Comprehending Text get 


5.1 Reading Text in Serif and Sans Serif Typefaces 


Ovink (1938) included a sans serif typeface (Futura) as well as a variety of serif 
typefaces in experiments where participants read lines of text presented for a limited 
exposure (pp. 84—88) or where their eye movements were monitored while they 
were reading passages held in their hands in clear view (pp. 88—100). In the latter 
case, Ovink used a mechanical apparatus in which a rubber pad rested on one of 
the participants’ upper eyelids. He also included a sans serif typeface (Gill Sans) as 
well as three serif typefaces in a further study where the short-exposure method was 
used to present material in the form of dictionary entries and where the participants 
subsequently rated the clarity of the typefaces that had been used (pp. 100-106). 
There were no marked differences between their responses to the sans serif typefaces 
and their responses to the serif typefaces. 

Wendt (1969, 1994) carried out an investigation to compare the effects of typo- 
graphic factors, including the legibility of a serif typeface (Bodoni) and a sans serif 
typeface (Futura). He constructed 16 passages in German and presented each in five 
columns across a sheet of paper. He asked roughly 2,000 students to read one of the 
passages and recorded the number of words read in 3 min. There was a slight mean 
advantage of 8.62 words for passages in the Futura typeface, but this did not achieve 
statistical significance. Wendt argued that modern readers were equally familiar with 
both styles of typeface. Indeed, in a subsequent survey of Australian students, the serif 
typeface Press Roman was ranked only marginally higher in their overall preference 
than the sans serif typeface Univers (Bell & Sullivan, 1981). 

Taylor (1990) arranged for the instructors of four remedial reading classes at a US 
high school to administer reading tests to groups of students aged 15—16. Over three 
weeks, they were timed reading excerpts from their reading workbooks, one printed 
in the sans serif typeface Helvetica, the other printed in the serif typeface Times 
Roman. (As was mentioned in Sect. 1.2, this is a similar typeface to Times New 
Roman that was developed by a rival printing company; it is sometimes known just 
as “Times”.) Across 74 students, the median difference in their reading rate scores 
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on the two typefaces was zero (p. 50). Over the next two weeks, they were given 
sheets of paper containing other excerpts printed in both of the typefaces, and they 
were asked to choose one version to read aloud. There was no significant difference 
in their preferences in either week. 

A major research issue is that actual examples of serif and sans serif typefaces tend 
to differ on a variety of other characteristics (Beier & Larson, 2010). Arditi (2004) had 
devised software to generate typefaces that differed only in the presence or absence 
of slab serifs and in the size of the serifs. Arditi and Cho (2005) used this tool to 
construct lowercase typefaces of uniform thickness with slab serifs that extended for 
0% (sans serif), 5%, or 10% of their cap height. Two individuals with normal vision 
and two with impaired vision were asked to read aloud three “passages” in which 
400 words had been randomly ordered to yield “scrambled” text. Arditi and Cho 
calculated each participant’s “reading speed” by dividing the number of characters 
in the correctly read words by the time taken to read each passage. The participants 
with normal vision obtained higher scores than those with impaired vision, but there 
was no significant variation in their reading speed as a function of serif size and a 
fortiori no significant effect of the presence or absence of the slab serifs. As Arditi 
and Cho noted, the small number of participants was a major limitation of their study. 


5.2 Comprehending Text in Serif and Sans Serif Typefaces 


In Sect. 2.2, it was noted that asking participants to read continuous meaningful text 
provides less opportunity for researchers to impose experimental control over their 
reading behaviour. To address this issue, some researchers have focused on their 
participants’ comprehension of such material rather than upon its legibility per se. 
This could be justified on the grounds that, in order to be comprehended, written 
material must first be read; thus the legibility of a text places an upper limit on how 
much of it can be comprehended. A different argument was put forward by Gasser 
et al. (2005). They suggested that, insofar as a message was easy to read, fewer 
attentional resources would need to be devoted to reading the message, leaving more 
resources available for the processing of the information that it contained. However, 
Reynolds (1979) argued that the value of comprehension tests in the measurement 
of legibility was questionable, insofar as comprehension implicated cognitive skills 
of a higher order than those that were required for the accurate perception of the 
text. Wilkins et al. (1996) also claimed that conventional reading tests tended to 
emphasise the linguistic and semantic aspects of reading rather than the purely visual 
aspects (see Sect. 4.1). Thus, factors affecting comprehension might not be relevant 
to measuring legibility. 
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Fox (1963) compared two typefaces intended for use on typewriters: Standard 
Elite, a conventional serif typeface; and Gothic Elite, a sans serif typeface in which 
lowercase letters are replaced by small capitals. Both typefaces are monospaced 
or non-proportional (each character occupies the same width), and small capitals in 
Gothic Elite occupy the same width as lowercase letters in Standard Elite. The partici- 
pants read two passages, one in each typeface, silently but as quickly as possible. After 
reading each passage, they were required to recount the story that it contained, and 
their comprehension was rated as “good” or “poor”. There was no significant differ- 
ence in the time taken to read passages in the two typefaces or in their comprehension 
of their content. 

Poulton (1965) asked 375 adult volunteers to read two passages of about 450 words 
in the same typeface. They were allowed 90s to read each passage and were given 
a test of their comprehension of ten key points from the passage. The passages had 
been printed in seven different typefaces: three serif typefaces (Baskerville, Bembo, 
and Modern) and four sans serif typefaces (Gill Medium, Grotesque 215, and two 
versions of Univers). On the first passage, Poulton found no significant differences 
among the comprehension scores, which he ascribed to the participants becoming 
familiar with the general procedure and the particular typeface that they had to read. 
On the second, there was significant variation among the sans serif typefaces but not 
among the serif typefaces; more important, none of the serif typefaces yielded scores 
that were significantly different from those of any of the sans serif typefaces. 

A fundamental question using this paradigm is whether a test on the content of a 
passage administered immediately after its presentation is a test of comprehension 
or simply a test of factual recall or verbatim memory (Hartley et al., 1975). Poulton 
and Brown (1967) used the same procedure as in Poulton’s (1965) study, but they 
remarked that their measure of “comprehension” was “more correctly described as 
a measure of memory” (p. 219). They found that requiring the participants to read 
a passage aloud led to poorer performance on the early key points in the passage 
but to better performance on the last key point in comparison with requiring the 
participants to read the passage silently. Unfortunately, Poulton and Brown had not 
matched the key points and their associated questions for difficulty, and consequently 
the theoretical interpretation of these results remains unclear. 

Soleimani and Mohammadi (2012) evaluated different typefaces in Iranian 
students who had been selected for having an intermediate proficiency in English. 
This included a reading comprehension test based on a passage from a widely used 
English-language textbook: it was presented in the sans serif typeface Arial for 42 
students and in in the serif typeface Bookman Old Style for 47 students. There was no 
significant difference between the two groups in their reading speed, in an immediate 
test of their reading comprehension, or in a multiple-choice test of their memory for 
ten key points from the passage that was administered 2 weeks later. 

Serif and sans serif typefaces are also used in some non-Western alphabets. 
Akhmadeeva et al. (2012) asked 238 Russian medical students to read a passage 
about the history of neurology in Russia. The passage had been printed in Cyrillic 
script using ParaType, a family of artificial typefaces: 108 students were shown the 
passage in a serif typeface, and 130 students were shown the passage in a sans serif 
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typeface with the same x-height. The students were given | min to read the passage 
and were then asked ten multiple-choice questions about its content. Akhmadeeva 
et al. found no sign of any difference either in the mean number of words that the 
two groups had read or in the mean number of questions that they had answered 
correctly. 


5.3 The Connotative Meaning of Typefaces 


Some researchers argued that typefaces could serve as carriers of connotative 
meaning (reflecting their associations with different attitudes, experiences, and 
emotions) as well as carriers of denotative meaning (reflecting factual information). 
Subjective impressions of the legibility of different typefaces can be regarded as just 
one aspect of their connotative meaning. (German-speaking writers sometimes refer 
to this quality as their Atmosphdrenwert or “atmosphere value”. North American 
writers sometimes talk about the “personality” of different typefaces.) This aspect 
might in principle affect their legibility for different readers. In this kind of research, 
participants are asked to report on their experiences and preferences when reading 
material printed in different typefaces. Once again, examples can be presented either 
individually or in groups of two or more for comparison, and the self-reports can be 
collected either informally (for instance, through interviews) or more formally (for 
instance, through the use of rankings or rating scales). 

As an example of a formal approach, Tinker and Paterson (1942) asked different 
groups of participants to arrange samples of ten different typefaces in order from the 
most legible to the least legible and from the most pleasing to the least pleasing. They 
found that their judgements of legibility and pleasantness demonstrated “remarkable 
agreement” (p. 40), to the extent that they could be regarded as being equivalent to 
one another. As Dreyfus (1985) pointed out, readers’ preferences may be irrelevant if 
they have no choice in whether or not to read something (such as an airline schedule 
or a railway timetable) but may be crucial if they can choose what they read (as in 
product information or voting literature). 

Another example of a formal approach is the semantic differential. This was 
devised by Osgood et al. (1957) to measure participants’ attitudes to objects and 
concepts. The participants provide evaluations of these using bipolar rating scales; 
typically, these are 7-point scales in which the middle category is neutral between 
the two poles. Their responses are subjected to factor analysis to yield higher-order 
dimensions that are regarded as reflecting underlying aspects of connotative meaning. 
Research studies in a wide variety of domains and cultures converged on three overar- 
ching dimensions that reflected variations in peoples’ attitudes: evaluation (good vs. 
bad), potency (strong vs. weak), and activity (active vs. passive) (Osgood et al., 1975). 
Hofstatter (1966) devised a similar methodology for German-speaking countries, 
which he described as a Polaritdtsprofile (polarity profile). 
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5.4 Connotations of Serif and Sans Serif Typefaces 


Connotative meaning is potentially important in the world of advertising. Berliner 
(1920) initiated this line of inquiry by asking students to rank order different hand- 
lettered styles on different dimensions for advertising different products. She found 
that their rankings of appropriateness were different for different products. Subse- 
quent researchers confirmed this when using actual typefaces (Davis & Smith, 1933; 
Poffenberger & Franken, 1923; Schiller, 1935). They found no clear difference 
between the ratings given to serif and sans serif typefaces; other features seemed 
to be more important in determining the appropriateness of different typefaces to 
different products. Ovink (1938, pp. 127—177) noted that these studies had only used 
typefaces that were in common use in advertising displays in the United States. He 
chose 17 typefaces, including some that were more widely used in Europe. He asked 
68 participants to rate the appropriateness of each of the typefaces for advertising 
purposes on eight dimensions. The 17 typefaces varied on most dimensions, but there 
was no clear difference overall between serif typefaces and sans serif typefaces. 

Using the semantic differential methodology developed by Osgood et al. (1957), 
Tannenbaum et al. (1964) presented participants with the English alphabet printed in 
both uppercase and lowercase in both upright and italic fonts in two serif typefaces, 
Bodoni and Garamond, and two sans serif typefaces, Spartan and Kabel. A total of 
75 participants rated each display on 25 scales. There were no significant differences 
between the ratings given to the serif typefaces and those given to the sans serif 
typefaces. 

Wendt (1968) asked 70 participants to rate 35 typefaces using an adapted version 
of Hofstatter’s (1966) semantic differential. Eleven were sans serif typefaces (seven 
variants of Folio and four variants of Futura). Each typeface was presented on a 
printed card containing the alphabet in lowercase, the alphabet in uppercase, and the 
ten single digits. Each was evaluated by ten participants on 63 7-point rating scales. 
Factor analysis was used to reduce their ratings to four broad dimensions, but there 
were no clear differences between the serif and the sans serif typefaces on these 
dimensions. Cluster analysis of the ratings yielded three clusters, but each of these 
contained both serif and sans serif typefaces. In other words, the participants’ ratings 
differentiated among the 35 typefaces, but they did not differentiate systematically 
between the serif typefaces and the sans serif typefaces. 

Benton (1979; Rowe, 1982) asked 24 participants to evaluate ten typefaces on 
26 bipolar 7-point scales. Five were general typefaces in general use, including one 
sans serif typeface (Helvetica); five were novelty typefaces. Each was presented on a 
printed sheet containing the alphabet in lowercase, the alphabet in uppercase, the ten 
single digits, and common punctuation marks. Factor analysis was used to reduce 
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their ratings to five broad dimensions, which Benton labelled “potency”, “elegance”, 
“novelty”, “antiquity”, and “evaluation”, and which she used to calculate scale scores 
for the ten typefaces. The serif typefaces (Bodoni, Garamond, Palatino, and Times 


Roman) did not differ significantly from each other on any of the five dimensions. 
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Moreover, Helvetica only differed from these typefaces on antiquity, where it was 
seen as being relatively modern. 

Bartram (1982) conducted a similar study in which 38 design students and 52 
students of other disciplines were asked to rate 12 typefaces on 18 bipolar scales. A 
factor analysis of the ratings provided by the design students yielded four dimensions: 
the three hypothesised by Osgood et al. (1957) (evaluation, potency, and activity) and 
a fourth dimension concerned with mood. Bartram then compared the scores on these 
dimensions given to the 12 typefaces by the design students and the other students. 
They included three regular upright typefaces: the serif typeface Times New Roman 
and the sans serif typefaces Futura Medium and Univers 67. Bartram did not compare 
the ratings of these typefaces directly, but their profiles were relatively similar. 

Morrison (1986) obtained ratings of four serif typefaces (Egyptian, Modern, Old 
Style, and Transitional) and one sans serif typeface (Contemporary) on 25 bipolar 
dimensions taken from Tannenbaum et al. (1964). There were three groups, each 
of 14 participants: typography students, technology students, and students of other 
subjects. They were presented with “text” consisting of statistical approximations 
to English in all five typefaces in three weights and in both upright and italic font. 
There were no significant differences among the three groups in terms of their mean 
ratings on seven scales that included the three primary factors identified by Osgood 
et al. (1957). The only significant difference between the serif typefaces and the 
sans serif typeface was that the latter received slightly higher ratings than the former 
on the potency factor, which Morrison attributed to sans serif typefaces being used 
on public signs that had an association with authority (such as freeway and airport 
signage). 

Tantillo et al. (1995) asked 250 students to evaluate examples of six typefaces 
on 28 bipolar 7-point scales. There were three serif typefaces (Century Schoolbook, 
Goudy Old Style, and Times New Roman) and three sans serif typefaces (Avant Garde 
Gothic, Helvetica, and Univers). Significant differences emerged on 26 scales: “The 
serif type styles... are rated as more elegant, charming, emotional, distinct, beautiful, 
interesting, extraordinary, rich, happy, valuable, new, gentle, young, calm, and less 
traditional than the sans serif type styles. Serif styles have more personality, freshness, 
high quality, vitality, and legibility, but the sans serif group is more manly, powerful, 
smart, upper-class, readable, and louder than the serif styles” (p. 452). 

These results are anomalous when compared with the findings of earlier research. 
A procedural difference is that Tantillo et al. only presented the nonsense word 
“NRESTA” in uppercase as the example of each typeface to be evaluated, but other 
studies had used longer examples involving both uppercase and lowercase letters. 
The rating forms that Tantillo et al. employed often showed the more positive pole on 
the left end of each scale, but it sometimes appeared on the right end, which may have 
led to a degree of confusion on the students’ part. This might explain two curious 
findings: first, sans serif typefaces are usually regarded as being more modern in their 
appearance than serif typefaces, yet Tantillo et al.’s participants rated serif typefaces 
as being younger and less traditional; second, the sans serif typefaces were rated 
as being significantly more readable but as significantly less legible than the serif 
typefaces. 
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5.5 Conclusions 


A number of studies have evaluated the role of typographic variables (including 
the presence or absence of serifs) in reading continuous text. Asking participants 
to read continuous text allows less scope for experimental control, and so other 
researchers have instead focused on participants’ comprehension of written material. 
In both cases, the modal finding is that there are no significant differences between 
text printed in serif typefaces and text printed in sans serif typefaces. Subjective 
impressions of the legibility of different typefaces can be regarded as one aspect of 
their connotative meaning, and other researchers have asked participants to evaluate 
typefaces on different dimensions using single rating scales or semantic differentials. 
The modal finding is that there are no significant differences in readers’ overall 
preference between serif and sans serif typefaces, nor any significant differences in 
the connotations of serif and sans serif typefaces. 
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6.1 The Importance of Context 


Whittemore (1948) argued that the legibility of different typefaces depended upon 
the context in which they were used. Readers may develop expectations with regard 
to the kinds of context in which particular typefaces are appropriate. A problem with 
much of the research described thus far is that it did not provide readers with any 
sensible context for their reading (Schriver, 1997, p. 277). Some researchers have 
endeavoured to address this issue. 

Zachrisson (1965, pp. 156-162) investigated the attitudes of experts and non- 
experts to whether material was printed in serif or sans serif typefaces. Typography 
experts and students from various subject areas were shown samples for each of six 
themes. A serif typeface was preferred for a wedding invitation, a perfume adver- 
tisement, and the title page for a book of lyrical verse, but a sans serif typeface was 
preferred for an invitation to an art exhibition, the title page for a book on modern 
architecture, and an advertisement for an oil stove. The rankings given by the experts 
and the non-experts were relatively similar. 

Hvistendahl and Kahl (1975) prepared four newspaper stories of 250—400 words 
in four serif typefaces (Imperial, News #2, News Bold, and Royal) and four sans serif 
typefaces (Futura, Helvetica, News Sans, and Sans Heavy). Each of 200 subjects was 
asked to read at their normal reading pace two stories in serif typefaces and two stories 
in sans serif typefaces; in each case, one story was printed in 10.5-point type, while 
the other was printed in 14-point type. For two stories, the serif typefaces yielded 
a significantly faster mean reading time than the sans serif typefaces; for the other 
two stories, there was no significant difference in the reading times. Hvistendahl and 
Kahl then showed each participant two out of eight stories set in both a serif typeface 
and a sans serif typeface; in each case, one story was printed in 10.5-point type, and 
the other was printed in 14-point type. The participants were asked to express their 
preference between the two typefaces in which each story was printed. Overall, the 
serif typefaces were preferred 68% of the time, while the sans serif typefaces were 
preferred only 32% of the time. Nevertheless, it should be noted that Hvistendahl 
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and Kahl did not ask their participants to compare stories set in serif and sans serif 
typefaces of the same size. 

Moriarty and Scheiner (1984) asked 260 college students to read a page of text 
from a sales brochure for stereo speakers. Half the students read the text in a serif 
typeface (Times Roman), and half read it in a sans serif typeface (Helvetica). Inde- 
pendent of this, half the students read the text in regular spacing, and half read it with 
an 18% reduction in spacing. They were given 105 s to read the text and marked the 
last word that they had read when the time limit was reached. The students given a 
close-set type read significantly more than those given the regular type. However, 
there was no significant difference between the students who read the text in a serif 
typeface and those who read it in a sans serif typeface and no significant interac- 
tion between the two variables. Moriarty and Scheiner concluded that there was no 
difference in reading speed between the serif and sans serif typefaces in their study. 


6.2 Serif and Sans Serif Typefaces in Newspaper Headlines 


The headlines placed above articles in the body of a newspaper are usually presented 
in a large typeface, often in a bold font, and they may extend over two or more 
lines. They may be presented (a) all in capitals, (b) with initial capitals for the 
principal words (“title case”), or (c) with initial capitals only for the first word and 
any proper nouns (sometimes known as “sentence case”). Research in the first half 
of the twentieth century showed that text presented in lowercase was read more 
quickly than text presented in uppercase and more specifically that headlines in 
title case were read more quickly than headlines all in capitals (see Tinker, 1963, 
pp. 186—190). Newspaper headlines are sometimes presented in sans serif typefaces. 
Arnold (1956) argued that “Sans Serif... is highly readable and, more than any other 
typographic device, conveys an impression of a newspaper that is alert and up to 
date” (p. 19). 

English (1944) presented three-line headlines in title case tachistoscopically for 
450 ms to 45 students of journalism and psychology. He used a serif typeface (Bodoni 
bold), a slab serif typeface (Karnak bold), and a sans serif typeface (Tempo medium) 
in three sizes, and he used the number of words reported correctly as a measure of 
performance. Headlines presented in Bodoni and Tempo yielded significantly better 
performance than those presented in Karnak but were not significantly different from 
each other. Variations in type size had no effect upon the participants’ performance. 
Another group of 50 students was shown pairs of headlines in different typefaces 
and asked to choose which member of the pair seemed easier to read. There were no 
significant differences among their preferences for the different typefaces. 

Haskins (1958) presented 300 participants recruited through the Saturday Evening 
Post with ten different magazine articles. The subtitles and text of the articles were 
presented in their original form, but their headlines were presented to different partic- 
ipants in one of ten different typefaces, including two sans serif typefaces (Futura 
Light and Futura Bold). The participants were asked to judge how appropriate each 
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headline was for the article to which it was attached using a 6-point scale. The vari- 
ation in their ratings across the ten typefaces was highly significant; in particular, 
Futura Light was on average judged to be fairly appropriate, whereas Futura Bold 
was judged on average to be very appropriate for eight of the articles. These results 
imply that sans serif typefaces are at least as appropriate as serif typefaces for the 
headlines of such articles. 

Click and Stempel (1968) carried out an experiment in which college students 
rated six newspaper front pages on 20 semantic differential scales (see Sect. 5.3). 
However, they had carried out a pilot study in which six front pages had used sans 
serif typefaces in their headlines and four had used serif typefaces. They found that 
sans serif typefaces and serif typefaces yielded similar ratings if they were used with 
similar front-page formats. As they noted, “The main source of variation in response 
was the format and not the typeface” (p. 130). Because of this, they only used sans 
serif headlines in their main experiment. 

Haskins and Flynne (1974) investigated whether the choice of different typefaces 
for newspaper headlines affected readers’ interest in the accompanying stories. They 
carried out interviews with 150 female heads of household, of whom 100 were 
asked to read through a genuine local newspaper in which a mock women’s page 
had been inserted. This contained five articles drawn from various newspapers and 
magazines in which the headlines had been printed either in a serif typeface judged 
to be relatively “feminine” (Garamond Italic) or in a sans serif typeface judged to be 
relatively “masculine” (Spartan Black). The remaining 50 female heads of household 
read the newspaper without the women’s page, together with the headlines of the 
five articles printed on individual white cards. The participants were asked to rate 
the attractiveness and interest of each page of the newspaper on a scale from 0 to 100 
and to rate their interest in each of the five articles. They were then shown the mock 
women’s page printed in ten different typefaces and were asked to rate each version 
using 12 semantic differential scales. 

Consistent with the researchers’ assumptions, two typefaces often used on 
women’s pages of newspapers, Garamond Italic and the cursive script Coronet Light, 
were rated more highly on several supposedly feminine characteristics, whereas 
Spartan Black, which was often used on sports pages, was rated more highly on 
several supposedly masculine characteristics. The other typefaces were rated as 
being relatively neutral on these characteristics. Nevertheless, there was no signifi- 
cant difference in the ratings of overall reading interest given to the women’s pages 
with headlines printed in Garamond Italic and in Spartan Black. Only one of the five 
articles showed a significant difference in the ratings of reading interest, where the 
version with a headline printed in Garamond Italic was rated more highly than the 
version with a headline printed in Spartan Black. Haskins and Flynne concluded that, 
while some typefaces used in headlines were perceived as more feminine or more 
masculine, this had no effect on a woman’s interest in reading a women’s page. 

In a study described in more detail in Sect. 6.3, Wheildon (1990, pp. 18-22; 2005, 
pp. 61-73) presented the same participants with articles containing headlines in 
different typefaces and evaluated their perceived legibility by asking the participants 
to say simply whether the headlines were easy to read or not. When the headline was 


46 6 Reading in Context 


presented in a lowercase serif typeface, 92% said that it was easy to read; when it was 
presented in a lowercase sans serif typeface, 90% said that it was easy to read. (He 
did not specify the exact typefaces that he had used, and he did not explain whether 
“lowercase” meant title case or sentence case.) Wheildon (1990, p. 22; 2005, p. 72) 
concluded that there was little to choose between serif and sans serif typefaces in 
headlines. 


6.3 Wheildon’s Research 


Colin Wheildon (1984) carried out a study of the impact of various typographical 
factors on the comprehension of newspaper copy that incorporated a comparison 
between serif and sans serif typefaces. He prepared revised versions of his orig- 
inal report in 1986 and 1990, and he also incorporated his account into a book on 
typography and design (Wheildon, 1995). This in turn went through several editions 
and revisions, and the final version was published in 2005. Wheildon’s research has 
been cited in support of the idea that serif typefaces are more legible than sans serif 
typefaces in continuous text (Kempson & Moore, 1994, pp. 52, 284; Schriver, 1997, 
p. 274). It has, however, proved to be extremely controversial (Poole, 2012). 

Wheildon recruited 300 volunteers from among the inhabitants of Sydney, 
Australia, and he visited them in their homes on several occasions. Their educational 
level tended to be higher than that of the general population (79% had graduated 
from high school and 23% had obtained a university degree or a comparable qual- 
ification), but none was professionally involved in printing or publishing. On each 
visit, they were asked to read a mock newspaper article to a time limit and were 
then asked ten questions to test their comprehension of its content. For each visit, 
the participants were randomly divided into two equal subsamples who read the 
article in different forms, and they were classified into three groups depending on 
the number of questions that they had answered correctly: between seven and ten 
questions, good comprehension; between four and six questions, fair comprehen- 
sion; and between zero and three questions, poor comprehension (Wheildon, 1990, 
p. 9; 2005, pp. 134-138). 

At two of the visits, the bodies of the relevant articles were presented in either a 
serif typeface (Corona) or a sans serif typeface (Helvetica). The sequence of adminis- 
tration of the two typefaces was counterbalanced across different participants, and so 
the comprehension of the two typefaces was compared within the same individuals. 
The results were analysed for the 224 participants who had participated at all of the 
visits. When reading an article in serif typeface, comprehension was scored as good 
for 67%, fair for 19%, and poor for only 14%; however, when reading an article in 
sans serif typeface, comprehension was scored as good for only 12%, fair for 23%, 
and poor for 65% (Wheildon, 2005, p. 47). 

Wheildon also asked the participants who had shown either poor comprehension 
or good comprehension “leading questions” about their attitudes to the articles and the 
layout of the pages. He commented that “these responses were collected for anecdotal 
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rather than scientific value” (Wheildon, 1990, p. 9), but he felt that they helped to 
explain some of the objective results (Wheildon, 2005, p. 138). He summarised the 
comments made by the 112 participants who had read an article intended to be 
of direct interest that had been presented in the sans serif typeface. Many of their 
comments referred to their difficulty in concentrating on the reading task. However, 
when they were then asked to read another article presented in the serif typeface, 
they reported no physical difficulties (Wheildon, 1990, p. 17; 2005, p. 48), 

In introducing his research, Wheildon (1990, p. 16; 2005, p. 46) had mentioned 
only one previous study on the legibility of serif and sans serif typefaces (Pyke, 
1926), and he did not acknowledge that the sheer size of the effect that he had found 
linking serif typefaces to better comprehension was clearly anomalous when taken 
in the context of the findings of all other research carried out up to that point. Nor 
did he comment on the apparent disparity between these results and his findings 
regarding the legibility of serif and sans serif typefaces when used in newspaper 
headlines (described in the previous section). Instead, he argued: “The conclusion 
must be that body type must be set in serif type if the designer intends it to be read 
and understood” (Wheildon, 1990, p, 17; 2005, p. 48). 

Poole (2012) argued that Wheildon’s account of his research lacked key infor- 
mation that would enable a sceptical reader to evaluate the study. The introduction 
to the expanded version of Wheildon’s (1990, pp. 9-10) report and an appendix to 
Wheildon’s (2005, pp. 133—140) book do provide additional information about his 
research methods, but some important details are unclear. For instance, the report 
states that each of the articles extended over several pages (Wheildon, 1990, p. 9). 
However, the book states that they were designed to fit in four columns 5 cm wide 
and 30 cm deep on a single page, while the examples that are provided in the book 
indicate that some space on the page was taken up by a headline, a by-line, and 
two illustrations (Wheildon, 2005, pp. 34-35). Neither account mentioned either the 
final number of words in each of the articles or the time allowed to read the articles 
and to answer the comprehension questions (Wheildon, 1990, p. 9; 2005, pp. 33-48, 
137-138). 

There are some additional issues with Wheildon’s research. First, his general 
account of the research methodology suggested that each participant was asked to 
read one article at each visit, and that comparisons were made between their compre- 
hension of different articles at different visits (Wheildon, 2005, p. 137). Neverthe- 
less, he also mentioned a group of 112 participants who were tested on an article of 
direct interest in a sans serif typeface but who were tested on “another article with a 
domestic theme” in a serif typeface immediately afterwards (Wheildon, 1990, p. 17; 
2005, p. 48). The latter arrangement is clearly more vulnerable to transfer or carry- 
over effects (for instance, due to practice or fatigue) than repeated testing separated 
by an interval of weeks or months. 

Wheildon (1990, p. 9; 2005, p. 136) had designed his materials to measure the 
effects of several different variables simultaneously. For example, half the mock 
newspaper articles were designed to be of direct or broad interest to the partici- 
pants, whereas the other half were designed to be of limited or specific interest. This 
manipulation produced a difference of 10 percentage points in the participants’ level 
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of comprehension (Wheildon, 2005, p. 36). Other variables included the use of capital 
letters in the headlines, the use of colour in the headlines or in the text, the use of 
justified versus unjustified text, and the use of italic font in the text. Wheildon (2005, 
p. 136) explained this by arguing that the different manipulations were logically 
separate from one another, and hence there was no need to change the samples of 
participants to measure their effects. However, this ignores the possibility that these 
effects were not empirically separate from one another. In other words, the apparent 
difference in comprehension between material printed in serif and sans serif type- 
faces might simply have been an artefact due to a confounding of this manipulation 
with one or more of the other variables. 

One further possibility is that of researcher bias. Wheildon’s (2005, pp. 24, 103) 
own comments make it clear that he had always had a deep antipathy towards sans 
serif typefaces. He himself had interviewed and tested all the participants in their 
own homes (p. 137), and so it is possible that his underlying attitudes to serif and sans 
serif typefaces might have (either intentionally or unintentionally) influenced how 
they had set about their task and thus might have influenced their comprehension. 
This issue should have been addressed by employing assistants to test the participants 
who were blind (1.e., uninformed) as to the specific research hypotheses. 


6.4 More Recent Research 


Schriver (1997, pp. 289-303) obtained examples of texts that might be read for each 
of four common purposes and presented each in both a serif typeface and a sans serif 
typeface with similar x-heights. The four purposes or “genres” were: (a) reading to 
enjoy (a two-page spread from a short story, presented in Bauer Bodoni and Univers); 
(b) reading to assess (a one-page business letter from a bank, presented in Palatino 
and Futura); (c) reading to do (a two-page spread from an instruction manual, printed 
in Times Roman and Helvetica); and (d) reading to learn to do (a two-page spread 
from a manual to help people estimate their taxes, printed in Garamond Light and 
Optima). Schriver presented these texts to 67 volunteers using a within-subjects 
design that counterbalanced the order of the texts and the order of the typefaces. The 
participants were asked to say which version of each text they preferred and also to 
say why. 

Similar proportions of participants chose the serif and the sans serif typefaces. 
More detailed examination showed that on balance participants tended to prefer the 
serif typefaces for the short story and the tax manual, but they tended to prefer the sans 
serif typefaces for the instruction manual and the business letter. Their preferences 
were influenced by various factors related to the rhetorical context of each text: the 
mood or tone of the text; the density of the text; the contrast among the parts of the 
text; the legibility of the text; and the quality of printing. Schriver concluded: “This 
study suggests that people find serif and sans serif typefaces equally pleasing but 
that the situation in which they are reading may lead them to prefer one style over 
the other” (p. 302). 
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McCarthy and Mothersbaugh (2002) showed a fictious advertisement about 
Ontario to 265 business students in their regular classes. Three aspects were varied 
independently and at random: serif versus sans serif typefaces taken from a family 
of artificial typefaces; 8-point type versus 10-point type; and types with x-heights of 
50% or 70% of the associated capital letters. The participants were asked to read the 
material to themselves for 1 min and to circle the word that they were reading when 
the time limit was reached. They were then asked to read a control advertisement 
following the same procedure and were classified into fast or slow readers using a 
median split on the number of words that they had read. 

The number of words that they had read from the first advertisement was analysed 
by a between-subjects analysis of variance with the independent variables of type- 
face, type size, x-height, and reading skill. The use of a serif typeface led to better 
performance than the use of a sans serif typeface, but only for fast readers reading 
small typefaces and only for fast readers reading typefaces with a large x-height. 
The contrast between serif and sans serif typefaces had no significant effect for slow 
readers, for fast readers reading large typefaces, or for fast readers reading typefaces 
with a small x-height. These results suggest that any differences in the legibility of 
serif and sans serif typefaces will only arise as a result of very specific interactions 
with the effects of other features of the typeface and of the readers themselves. 

In the studies mentioned in Sect. 5.4, Bartram (1982) and Rowe (1982) had only 
studied the connotations of typefaces when used for individual words. Even so, both 
maintained that these connotations would influence readers’ interpretations when 
the typefaces were used for regular narrative. E. R. Brumberger (2003b) set out 
to test this idea by comparing the connotations of typefaces and the connotations 
of texts in which they were used. In her first study, she asked 80 students to rate 
how much each of 15 descriptors applied to each of 15 typefaces. A factor analysis 
of their responses yielded three broad dimensions, which she labelled “elegance,” 
“directness,” and “friendliness.” Multidimensional scaling yielded a similar grouping 
of the 15 typefaces. Brumberger noted that the resulting categories were based on the 
semantic qualities of the typefaces, not their physical characteristics. In particular, 
each category subsumed both serif and sans serif typefaces (see also E. Brumberger, 
2004; Mackiewicz & Moeller, 2004). 

In her second study, Brumberger (2003b) presented another 80 students with 
15 different texts, each containing 375 words, drawn from a variety of published 
sources. The participants were asked to read each text and to rate it on the same 15 
scales. A factor analysis yielded three broad dimensions that were very different from 
those found in the first study. She labelled these new dimensions “professionalism”, 
“violence”, and “friendliness”. Brumberger argued that she had demonstrated that 
readers consistently ascribe particular personality attributes to particular typefaces 
and text passages. However, the lack of concordance between the results of the 
two studies contradicts the idea that the connotations of typefaces affect readers’ 
interpretation of the texts in which they appear. 

Brumberger (2003a) selected one typeface that represented each of the dimen- 
sions in her first study: the cursive typeface CounselorScript for elegance, the serif 
typeface Times New Roman for directness, and the sans serif typeface Bauhaus Md 
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BT for friendliness. She also selected one text that represented each of the dimensions 
in her second study: a passage from a psychology textbook for professionalism, an 
excerpt from a novel for violence, and an excerpt from an article on snowboarding for 
friendliness. Different groups of students were presented with the texts in different 
typefaces in a between-subjects counterbalanced design and were asked to rate the 
relevant text on the 15 scales used in her earlier study. There were significant differ- 
ences in the ratings given to the three texts, but no significant difference in the ratings 
given to texts in different typefaces. Brumberger suggested that the “persona” of the 
texts might have overridden that of the typefaces (p. 230). 

Gasser et al. (2005) asked 149 psychology students to read an information sheet 
about tuberculosis that was being used at a local health-care facility. They were 
presented with the information sheet in one of four typefaces: a slab serif typeface 
with monospacing (Courier), a serif typeface with proportional spacing (Palatino), a 
sans serif typeface with monospacing (Monaco), or a sans serif typeface with propor- 
tional spacing (Helvetica). (With monospacing, each character occupies the same 
width, but with proportional spacing different characters take up different amounts 
of horizontal space.) They read the material silently at their own pace and were then 
given a short attitudinal questionnaire as a distractor task. Finally, they answered six 
open-ended questions on key points in the material. The students who read the mate- 
rial in serif typefaces answered more of the questions correctly than did those who 
read it in sans serif typefaces, although the difference only just attained statistical 
significance. Gasser et al. suggested that the students had found information printed 
in serif typefaces easier to read (and thus easier to remember) because they were 
more familiar with such typefaces in their regular educational material. 

Juni and Gross (2008) asked 102 university students to read two satirical articles 
from the New York Times; one was concerned with government issues and the other 
with education policy. They then rated each article on 12 qualities. One of the articles 
was presented in the serif typeface Times New Roman, and the other in the sans 
serif typeface Arial. For the government article, the version presented in Times 
New Roman was rated as significantly more angry and as significantly less cheerful 
than the version presented in Arial. For the education article, the version presented 
in Times New Roman was rated as significantly more frivolous than the version 
presented in Arial. It should however be noted that Juni and Gross found just three 
significant differences out of a total of 24 comparisons between the two typefaces 
without controlling for the possibility of spurious results due to chance variation (i.e., 
Type I errors), suggesting that the choice of typeface generally made little difference 
to their participants’ perceptions of the two articles. 
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6.5 Conclusions 


It has been argued that the context of reading is a primary determinant of the legi- 
bility of different typefaces and the readers’ expectations of the legibility of what 
they are reading. Newspaper headlines have been used as a specific context in which 
researchers have studied the legibility and connotations of different kinds of text. 
Wheildon (1990, 2005) presented an extensive programme of research on the legi- 
bility of different kinds of text. However, his research has come under extensive crit- 
icism and suffers from further issues that have not been noted in previous research. 
Several researchers have subsequently considered the effect of variations in type- 
faces and the expectations of readers in different kinds of situations. In general, 
research on reading in context provides no convincing evidence for any difference 
in the legibility of serif and sans serif typefaces. Nevertheless, there is a suggestion 
that readers’ preferences and the connotations of serif and sans serif typefaces might 
well vary between different contexts. 
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Chapter 7 A) 
Younger and Older Readers rie 


7.1 Younger Readers 


Research with young children is of interest in the present connection. First, as novice 
readers, young children may be disproportionately sensitive to variations in typo- 
graphic variables such as the presence or absence of serifs. Second, the preparation 
of educational material in either serif typefaces or sans serif typefaces might affect 
the facility with which children acquire the ability to read using such material. 

Children’s books were generally printed in serif typefaces until the 1930s 
(Walker & Reynolds, 2003). A notable exception was the educational material 
published by Nellie (Ellen) Dale in collaboration with the artist Walter Crane in 
the United Kingdom in the 1890s and early 1900s (Dale, 1902b, 1903; for discus- 
sion, see Brockington, 2012). Young children were introduced to individual letters 
and combinations of letters printed in a sans serif typeface and carried out various 
exercises that involved writing on blackboards or slates using coloured chalk as well 
as reading letters aloud (Dale, 1903, pp. 15-18). They were next introduced to a series 
of readers or “primers” containing groups of new words of increasing phonological 
complexity that were once again presented in a sans serif typeface. Each group of 
new words was then used in a short narrative that was printed in a serif typeface 
and accompanied by a relevant illustration. Examples are shown in Fig. 7.1 (see also 
Dale, 1902a; Walker, 2013, pp. 88-89). Dale’s work was mentioned by Burt (1959, 
p. 8; 1960) in reviews of the literature, but her work now seems to have been almost 
completely forgotten. 

Subsequently, many educational publishers produced their own typefaces 
consisting of “infant characters” modified for novice readers. For instance, the loops 
in characters such as a and g are only partially circular in sans serif typefaces (see 
the right-hand panel of Fig. 1.1 in Sect. 1.2) but are often more completely circular 
when rendered as infant characters. The assumption appears to have been that chil- 
dren should learn the simpler shapes of letters printed in a sans serif typeface in 
developing their handwriting before learning more complex shapes printed in a serif 
typeface (Coghill, 1980; Walker, 2013, pp. 31-35 et passim; Watts & Nisbet, 1974, 
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73. 74. bench trench 

bent rent bend | tench French 

| tent went mend | “On that bench is a lad, 
Kent sent lend catching tench. It is Dan!” 

S : 3 i It is getting foggy and Dan 


| lent spent send | thinks it best to run back. 


Phyllis and Betty went to > Bump! Bump! pump) 
Kent on a visit to Granny. “What was that? 

Granny has given them a “Dan slipped and fell into a 
shilling to spend. Phyllis spent trench. Let us run! Unless a 
| sixpence on a packet of candy, l helping hand is given, the ina 
and is sending six sticks to | cannot get on to the bank. 
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a present. Phyllis thinks Lily’s and Dan was drenched. 
wax dolly wants a 
sun-bonnet. Betty 
will get a pink sun- 
bonnet and send it 
in a box to Lily. 


eo 


Fig. 7.1 Two pages from one of Dale’s Readers. From The Dale readers: Infant reader (new ed.), 
by N. Dale. 1902. George Philip & Son. In the original, the illustrations and some letters in the sans 
serif headings were rendered in colour. Reproduced by kind permission of the Philip’s Division 
of Octopus Publishing Group and Sue Walker from http://www.bookdata.kidstype.org/database/dat 
abase/getImage?id=1018 


p. 33). (This is, of course, exactly the pedagogical approach that had been promoted 
by Dale.) As a result, teachers nowadays tend to prefer sans serif typefaces over serif 
typefaces (Raban, 1984; Walker & Reynolds, 2003), and many books for younger 
readers are printed in sans serif typefaces (Bluhm, 1991; Walker, 2013, pp. 33-35). 


7.2 Burt and Kerr’s Research 


One study often cited in support of the idea that serif typefaces are more legible than 
sans serif typefaces (e.g., Gallagher & Jacobson, 1993; Schriver, 1997, p. 274) was 
described by a psychologist, Cyril Burt, who collaborated with a physician, James 
Kerr. The research involved ten groups of children, each consisting of 15 boys and 
15 girls aged 10-11 years. It was apparently carried out between 1913 and 1917, 
but the findings were not reported until after Burt’s retirement in the 1950s (Burt, 
1959; Burt et al., 1955). The final experiments used ten different serif typefaces. Burt 
et al. (1955) mentioned that they had carried out initial experiments with four other 


7.2 Burt and Kerr’s Research 55 


typefaces, including a sans serif typeface, but that these had proved “much inferior” 
to those that had been selected (p. 32). Burt (1959) explained that “in our own early 
experiments Dr Kerr and I [7] found almost at once that, for word recognition, a sans 
serif type face was the worst of all” (p. 9). 

The reference in square brackets was to Kerr’s (1926) account of his own findings. 
Kerr mentioned the possibility of using letters in a sans serif typeface, but he simply 
asserted, without argument or evidence, that “owing to irradiation they are not as 
legible as letters with thicker ends” (p. 552). (On the topic of irradiation, see Sect. 1.2.) 
No further information was provided regarding experimental comparisons between 
serif and sans serif typefaces in either Burt’s account or Kerr’s. Moreover, Hearnshaw 
(1979, pp. 227-261) concluded that there were serious doubts about the validity and 
authenticity of the data presented in Burt’s publications, while Hartley and Rooum 
(1983) argued that this was true in particular of his work on typography. In fact, Burt 
et al. (1955, p. 32) claimed that the sans serif typeface which they had used was Gill 
Sans (see also Burt, 1959, p. 9), but this was not made available until 1928 (Kinross, 
1992, pp. 62-63), which was more than a decade after Burt and Kerr had supposedly 
carried out their research. 


7.3 Zachrisson’s Research 


The first systematic analysis of the legibility of serif and sans serif typefaces among 
children was carried out by Zachrisson (1965, pp. 97-108). Groups of 36 boys in 
Grade 1 (aged 7—8 years) were drawn from two Swedish schools. At one school, 
reading instruction was based on material printed in a serif typeface; at the other 
school, it was based on material printed in a sans serif typeface. The boys were tested 
individually and were asked to read aloud two pages of text. They were randomly 
assigned to receive text that was printed in one of two serif typefaces (Bembo or 
Nordisk Antikva) or one of two sans serif typefaces (Gill Sans or Mager Konsul). 
Zachrisson analysed the number of errors made when reading the text. There was 
a significant variation among the four typefaces, but the overall difference between 
the serif typefaces and the sans serif typefaces was not significant, and the difference 
between the two schools (and hence the typefaces used for reading instruction) was 
not significant. 

In a subsequent experiment (pp. 109-115), Zachrisson drew a sample of 24 boys 
and 24 girls from Grade 4 (aged 10-11 years) of the school where instructional 
material was printed in a sans serif typeface. They were asked to read silently two 
different passages, but their reading was interrupted from time to time by comprehen- 
sion questions. Once again, the passages were printed in one of two serif typefaces 
(Bembo or Fairfield) or one of two sans serif typefaces (Fin Grotesk or Gill Sans). 
Each participant read one passage in a serif typeface and the other in a sans serif 
typeface. There was no overall difference in the time taken to read each passage 
between the serif typefaces and the sans serif typefaces, and there was no significant 
variation among the four typefaces. 
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For his next experiment (pp. 115-121), Zachrisson presented individual words in 
isolation by means of a tachistoscope; alternate words were presented in a serif type- 
face (Mediaeval) and in a sans serif typeface (Mager Futura). The sample consisted 
of 12 boys in Grade 1 at each of the schools involved in the first experiment, together 
with six boys and six girls in Grade 4 at the school involved in the second experi- 
ment. The words were presented for 40 ms to the younger children and for 20 ms 
to the older children. Zachrisson analysed the number of words correctly reported, 
with partial credit for words that were incorrectly reported. There was no significant 
difference between the two typefaces for either the younger children or the older 
children, and no significant difference between the two schools. 

Zachrisson employed the same research design and materials in a further study 
that employed Weiss’s (1917) focal variator (pp. 121-124), which was described 
in Sect. 1.2. In this case, he analysed the threshold at which the blurred image of 
a word was correctly recognised. The difference between the two typefaces was 
not significant for either the younger children or the older children. The difference 
between the two schools for the children in Grade 1 was highly significant, but 
Zachrisson did not report the direction of the difference. Zachrisson also adopted 
the same research design using a perimeter, which is a device for presenting visual 
material in peripheral vision (pp. 124—128). This study failed to yield any significant 
results, and he inferred that this was not a useful way to measure the legibility of 
typefaces. 

Zachrisson had also asked the children who participated in his first two exper- 
iments to rank order the four typefaces in which the passages had been presented 
(pp. 131-132). They were instructed as follows: “The point is to say which of these 
you find most legible—inviting, pleasant, easy, to read, and in which order you want 
to put them according to their legibility. Which one do you like best, next best, third 
and least?” (p. 131). There was no significant difference in the ranks assigned to 
the four typefaces, no significant difference between the children in Grade 1 and 
the children in Grade 4, and no significant difference between children in Grade 1 
at the two schools (and hence between the typefaces used for reading instruction). 
Zachrisson concluded on the basis of all his findings that “there is no significant 
difference, in objectively measured legibility, or subjective opinion regarding ease 
of reading between the OF [serif] and SS [sans serif] type faces” (p. 132, italics in 
original). 


7.4 Other Research with Children 


Weiss (1978, 1982) asked 145 boys and girls in Grades 3 and 6 (aged 8-9 and 
11-12) at two public schools in New York City to express their preferences among 
printed material that differed in page size, type size, and typeface. Each example 
was printed as a two-page spread of text but consisted simply of familiar words 
in a random sequence together with an arbitrary illustration presented in the top, 
middle, or bottom of the right-hand page. Weiss chose three typefaces that were easily 
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discriminable: a sans serif typeface (Futura) and two serif typefaces (Paladium and 
Parinesy). The children were divided into three ability groups based on their scores 
in an achievement test and were interviewed individually about their perceptions 
and preferences regarding the different page sizes, the different typefaces, and the 
different positions of the illustrations. 

Of the three factors, the typeface was regarded as important by children in Grade 
3 but not by children in Grade 6, by boys but not by girls, and by children of medium 
ability but not by children of low or high ability. Children of low and middle ability 
tended to prefer the Futura sans serif typeface, whereas children of high ability 
tended to prefer the Paladium serif typeface. There was no significant difference 
in preferences between boys and girls or between the children in different grades. 
When asked to give reasons for their preferences, the children mainly referred to the 
legibility and the attractiveness of the printed material. As Weiss noted, this pattern 
is consistent with the results obtained by Tinker and Paterson (1942) in the case of 
adult readers. 

Coghill (1980) carried out an informal study in which 38 children aged about 
5 years who were being taught to read using materials printed in a sans serif typeface 
(Gill Sans) were asked to read aloud sentences printed in that typeface or in a number 
of serif typefaces: “In almost every case the children found little difficulty in reading 
alternative typefaces” (p. 257), and any reading errors tended to be repeated across 
the different typefaces. 

Sassoon (1993) described a study in which 50 8-year-old children were shown a 
short passage in four different typefaces and four different settings and were asked 
to choose the typeface that they liked best and found easiest to read. There were 
two serif typefaces, Times Roman and Times Italic, and two sans serif typefaces, 
Helvetica and a slanting sans serif typeface that Sassoon herself had developed. 
Their preferences were fairly evenly distributed, and Sassoon argued that children 
were able to assimilate different typefaces as a result of their exposure to television 
graphics and other kinds of advertising. These findings led her to promote her own 
typeface, Sassoon Primary, in material for young readers (Bluhm, 1991). However, 
using two different procedures, Wilkins et al. (2009) found that children read words 
in this typeface less quickly (both aloud and silently) than they read words of the 
same x-height in the sans serif typeface Verdana. Wilkins et al. suggested that this 
was because, with Sassoon, neighbouring letters tended to use strokes that were more 
similar in shape. 

De Lange et al. (1993) asked 160 schoolchildren to read two pages of text and to 
mark all occurrences of a particular word. Half received two pages in the same serif 
typeface (Times Roman), but the other half received the first page in Times Roman 
and the second page in a sans serif typeface (Helvetica). Each group contained equal 
numbers of children from each of four schools, equal numbers of children from Years 
4 and 6, and equal numbers of children with high and low academic performance. De 
Lange et al. calculated the scanning speed on each page by dividing the number of 
marked targets by the scanning time and then calculated the gain in scanning speed 
from the first page to the second. There was no sign of any significant difference 
between the two conditions in the gain scores obtained by the children in either 
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year. De Lange et al. concluded that there was no significant difference between the 
legibility of Times Roman and Helvetica as measured by a scanning process. 

Walker and Reynolds (2003) presented excerpts from a children’s reading book 
to 24 children aged 5-7 years in either a serif typeface (Century) or a sans serif 
typeface (Gill Sans) with or without infant characters. The typefaces were balanced 
across the four excerpts in different children. Walker and Reynolds measured the time 
taken to read each excerpt aloud as well as the errors that the children made in doing 
so. There was no significant variation in the time taken or in the number of errors 
made across the four typefaces. Walker and Reynolds took these results to confirm 
Coghill’s (1980) view that children do not find non-infant characters problematic. 
The children were also asked to express their preferences among the typefaces: eight 
expressed no preference, but eight expressed a preference for Gill Sans without infant 
characters, and five expressed a preference for Gill Sans with infant characters. 

Ripoli (2015) noted that in many Spanish-speaking countries children are taught 
to read using material printed in a cursive typeface before moving on to serif and 
sans serif typefaces. He tested 115 children who had been taught to read using a 
cursive typeface in the final year of preschool. They were asked to read aloud six 
short texts taken from a children’s book in Spanish, each in one of six typefaces: a 
cursive typeface (Escolar 1), two serif typefaces (Sylfaen and Times New Roman), 
and three sans serif typefaces (Arial, Lexia Readable, and Comic Sans). Assignment 
of the six typefaces to the six texts was counterbalanced across different children, 
and the x-height of the lowercase letters in each typeface was matched to that of Arial 
14-point type. There was significant variation across the six typefaces in the number 
of incorrect line breaks made by the children, but no significant variation in the 
number of words read correctly per minute or in the number of errors that they made. 
Ripoli observed that, despite having been taught to read using a cursive typeface, 
the children had no difficulty reading using non-cursive typefaces, even using Lexia 
Readable (which they had not previously encountered). There was also no difference 
in their performance on the serif typefaces and on the sans serif typefaces. 

Griffiths (2020) devised the Comparative Rate of Reading Speed Test. This 
involved two displays, each consisting of 13 lines with 60 characters in each line. 
The characters were random groups of between one and seven lowercase letters. The 
first display was printed in black in the serif typeface Times, and the second display 
was printed in teal in the sans serif typeface Gill Sans. A total of 92 children aged 
11-12 years were asked to read the characters aloud and were timed on their reading 
of the fifth line in each display. The mean times were 40.53 s for the Times display 
and 34.81 s for the Gill Sans display. The difference between these mean times was 
highly significant. Griffiths commented that the use of teal for the Gill Sans display 
had been “a concession to light sensitive subjects” (p. 11). He suggested that the 
result was due to binocular deficiency (i.e., unstable co-ordination of the eyes), but 
this only affects 15% of the general population (Hargreaves, 2008). He acknowl- 
edged that the effect of typeface had been confounded with that of colour, and he 
argued that the latter was the more important variable, possibly due to a reduction in 
contrast. However, since all of the children saw the displays in the same order, the 
result might simply have represented a practice effect. 
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7.5 Letter Reversals 


It has been known for more than a century that children who are learning to read tend 
to confuse pairs of lowercase letters that are mirror images of each other (e.g., b and 
d; p and q) (Mach, 1897, p. 50). This appears to be true in all cultures that use the 
Western alphabet in reading and writing (Goikoetxea, 2006). It is mainly apparent 
in the final year of kindergarten and the early years of compulsory education, and 
it is apparent in a variety of tasks extending beyond reading aloud and writing to 
dictation (Thompson, 2009). Such errors are found in normal readers as well as in 
children with learning disabilities and children who are dyslexic; nevertheless, they 
become less common in older children as their reading develops (Cossu et al., 1995; 
Davidson, 1936; Kennedy, 1954). (They are sometimes seen in older neurological 
patients and occasionally in healthy older adults: Balfour et al., 2007.) 

In the regular forms of most sans serif typefaces, the relevant letter pairs are 
exact mirror images of one another (see Fig. 1.1 in Sect. 1.2). Some authors have 
argued that the addition of serifs enhances legibility by making individual letters 
more discriminable from one another (e.g., Legros, 1922, p. 11; McLean, 1980, 
pp. 42-44). Yule (1988) and Wiebelt (2004) argued in particular that serifs help to 
differentiate between confusable letter pairs because they are no longer exact mirror 
images of each other. (For instance, the left-hand panel of Fig. 1.1 shows that, in both 
the letters b and d, the serifs are on the left-hand side of the ascenders.) However, 
other authors have argued that, even with the addition of serifs, the relevant pairs of 
lowercase letters do not differ appreciably from each other (e.g., Potter, 1949, p. 11). 
Indeed, Lockhead and Crist (1980) found that more explicit cues were needed to 
enable children to discriminate between such letter pairs. 

Evaluating the two positions is difficult, because most researchers who have 
described letter reversals in young children have not specified the typeface used 
in their reading tests. It is therefore impossible to say whether the children had been 
asked to read letters printed in serif or sans serif typefaces. Popp (1964) did specify 
the use of a serif typeface (Century) to present lowercase letters in a two-alternative 
forced-choice experiment where children had to match letters presented on a touch- 
sensitive projection screen. Their error rate was highest on the pairs b-d and p-q, 
thus confirming that mirror reversals still occur with letters that are presented in a 
serif typeface. 

In the study mentioned in Sect. 7.4, Ripoli (2015) examined the number of letter 
reversals made by 115 children when reading six texts printed in six different type- 
faces. The number of letter reversals was least for texts printed in the cursive typeface 
Escobar | and the sans serif typeface Lexia Readable, where there are additional cues 
that serve to differentiate the critical pairs of lowercase letters. However, there was 
no significant variation in the number of letter reversals for texts printed in the other 
four typefaces: two serif typefaces (Sylfaen and Times New Roman) and two sans 
serif typefaces (Arial and Comic Sans). These results imply that, in the absence of 
additional cues, the presence of serifs does not in itself help children to discriminate 
between pairs of letters that are otherwise mirror images of each other. In general, 
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mirror reversals in children’s reading and writing relate to structural properties of 
the relevant alphabet and not to the particular typeface used to render that alphabet 
(Treiman et al., 2014). 


7.6 Older Readers 


Vanderplas and Vanderplas (1980) suggested on the basis of interviews carried out 
with older people that many did not read as much as they would have liked because 
of difficulties with illegible type, although some publishers do produce large-type 
versions of newspapers and books intended for older readers. The researchers asked 
28 volunteers aged between 60 and 83 to read 30 passages of 30-33 lines taken from 
Samuel Butler’s novel, Erewhon. The 30 passages were presented in one of five type 
sizes from 10 to 18 points and in one of six typefaces. Three were serif typefaces 
(Century Schoolbook, Times Roman, and Bodoni), and three were sans serif type- 
faces (Helvetica, Spartan, and Trade Gothic). The type sizes and the typefaces were 
counterbalanced, but the order of the passages reflected the narrative structure of the 
novel. After reading each passage, the participants were given a short test of their 
comprehension to ensure that they had actually read the material, and they then rated 
the passage on six aspects of its presentation using a 7-point scale. 

Their reading speed was significantly faster for passages with serif typefaces 
than for passages with sans serif typefaces, but there was also significant variation 
in reading speed across the six typefaces: Century Schoolbook yielded the fastest 
reading speed, but Spartan yielded the slowest. Their reading speed generally tended 
to increase with the type size. The participants also rated passages with serif typefaces 
more positively than passages with sans serif typefaces on their apparent size, how 
easily they could be read, and how easily they could be understood. They also rated 
12-point typefaces as being the easiest to read. 

One situation in which older readers encounter problems is in reading labels 
on their medication, regardless of whether the labels are prepared using dot-matrix 
printers (Zuccollo & Liddell, 1985) or more advanced laser printers (Watanabe et al., 
1994). Smither and Braun (1994) asked 19 younger adults (mean age 25.48 years) 
and 20 older adults (mean age 71.05 years) to read the labels on 18 prescription 
bottles printed in a serif typeface with proportional spacing (Century Schoolbook), 
a monospaced slab serif typeface (Courier), or a sans serif typeface with propor- 
tional spacing (Helvetica). They read more slowly and made more errors on the 
labels printed in Courier than on the labels printed in either Century Schoolbook or 
Helvetica. The older adults also read more slowly and made more errors than the 
young adults when reading labels printed in Courier. 

Smither and Braun suggested that the participants might have had problems 
because of the curvature of medication bottles. They repeated their experiment with 
new participants and with the 18 labels placed on a flat surface. These participants 
once again read the labels printed in Courier more slowly, but they did not make 
more errors than on the labels printed in Century Schoolbook or Helvetica. The 
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older adults read more slowly than the young adults across the board, but they did 
not make more errors. Smither and Braun inferred that reading medication labels was 
more effective if they were printed with proportional spacing (Century Schoolbook 
or Helvetica) than if they were printed with monospacing (Courier). However, the 
presence or absence of serifs seemed to have little or no effect on the legibility of 
medication labels. 


7.7 Conclusions 


As novice readers, young children may be disproportionately affected by different 
typefaces. The use of different typefaces may also affect how readily children acquire 
the ability to read. Research by Burt (1959), Burt et al. (1955) and Kerr (1926) is 
often cited in support of the idea that serif typefaces are more legible. However, 
their accounts are inadequate and contain many contradictions. Zachrisson (1965) 
provided a more thorough account of the role of typographic variables in reading 
among children of different ages using various research methods and found no 
evidence for any difference in legibility between serif and sans serif typefaces. Subse- 
quent research by other investigators has tended to confirm Zachrisson’s conclusions. 
It has been known for more than 100 years that children tend to confuse letters that 
are mirror images of each other (such as p and q). This phenomenon occurs with 
both sans serif letters (which are true mirror images) and serif letters (which are not). 
Older readers tend to suffer from visual problems which may depend on typograph- 
ical factors. This is of practical importance, as in the design of labels for medication 
containers. Nevertheless, there are no differences in the reading capability of older 
readers when presented with material printed in serif and sans serif typefaces. 
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Chapter 8 A) 
Readers with Disabilities E 


8.1 Readers with Visual Impairment 


Nolan (1959) tested 264 children with visual impairment aged between 8 and 20. Half 
of the children counted as legally blind, but the others did not. Within each group, the 
children were randomly assigned to different subgroups to read material presented 
in either 18-point type or 24-point type in either a serif typeface (Antique with Old 
Style) or a sans serif typeface (Metrolite Medium). The material itself consisted of 72 
paragraphs, each of about 90 words; after reading each paragraph, the children had 
to answer a short comprehension question by choosing one of five alternative words. 
Nolan measured the number of paragraphs that they had read in 30 min, statistically 
adjusted for variations in their reading comprehension. 

The results showed that the children with better vision read faster than the children 
with poorer vision, and that the material presented in Antique with Old Style was 
read more quickly than the material presented in Metrolite Medium There was no 
significant difference in reading speed between the two type sizes, and there were 
no significant interactions among the effects of reading ability, type size, and type- 
face. Nolan commented that Metrolite Medium had needed a greater line length than 
Antique with Old Style, and she suggested that it was this feature rather than the 
absence of serifs that explained the slower reading time. In fact, Nolan had standard- 
ised the line length at 8.5 in. (21.6 cm) in all four sets of material. Consequently, 
the material that was presented in Metrolite Medium would have needed more lines, 
which suggests that it was the number of lines rather than the presence or absence 
of serifs that had given rise to differences in reading performance between the two 
typefaces. 
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8.2 Shaw’s Research 


During the 1960s, Alison Shaw was commissioned by the UK Library Association 
(now the Chartered Institute of Library and Information Professionals) to study the 
reading capabilities of people designated as partially sighted, defined as those who 
had difficulty reading normal sized book print but whose sight could not be fully 
corrected by the use of spectacles. The findings of the research were published as a 
formal report (Shaw, 1969). Shaw also published three short journal articles about the 
study, but these omit key information about her research methods and data analyses, 
and so this account is based on her formal report. 

Shaw decided to investigate her participants’ capacity for reading continuous 
printed text at a close reading distance (p. 9). She chose to investigate separate 
samples of adults and children, with a primary focus on the former (p. 17). She 
identified four aspects of the text which might be important: typeface, type weight, 
type size, and type spacing. With regard to typeface, she compared Plantin, a serif 
typeface which had been developed in 1913 by the Monotype Corporation, and Gill 
Sans, the sans serif typeface developed in 1928 by Eric Gill. She argued that these 
were representative examples of serif and sans serif typefaces that were relatively 
similar in terms of their x-height and their width (p. 23). With regard to weight, 
she compared the medium and bold versions of these two typefaces. With regard 
to size, she used 12-, 14-, 16-, 18-, 20-, and 24-point sizes. Each participant was 
tested to determine their visual acuity, and they received material in two successive 
point sizes that were just below and just above their acuity threshold (e.g., 16-point 
and 18-point sizes) (p. 24). With regard to spacing, she used four combinations of 
inter-letter, inter-word, and inter-line spacing (pp. 24—25), which yielded 32 different 
combinations of typeface, weight, size, and spacing. 

Each participant received four of these 32 combinations. The materials consisted 
of short passages of common words that were combined to yield grammatical but 
semantically anomalous sentences (e.g., “Hungry bridges describe expensive farm- 
ers”: p. 32). Each of the passages consisted of six sentences printed in five lines 
containing 38—41 characters. One passage was assigned to each of the 32 condi- 
tions, but a pre-test was carried out using ten readers with normal vision to ensure 
that the 32 passages were of comparable difficulty (p. 33). The conditions assigned 
to different participants and the order of their administration were counterbalanced 
using Graeco-Latin squares (p. 29). 

Shaw measured the number of words in each passage that were read aloud correctly 
and the time taken to read them. She used this information to calculate the average 
time per correct word and normalised the latter by dividing it by the average time 
per word taken by the normal readers to read each passage. There was a positive 
correlation between the mean and the variability of the normalised data, and so 
the logarithms of these data were used. Finally, the four transformed values for 
each participant were expressed as deviations about their mean value in order to 
eliminate differences among the participants (p. 37). Multiple regression analyses 
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were then used to identify the factors that were responsible for significant variations 
in performance. 

A total of 288 adults with visual impairment were recruited through government 
agencies and voluntary associations. Shaw compared their characteristics with those 
of the population according to national statistics, and she concluded that the sample 
was broadly representative of the national population of adults with partial sight 
(p. 53). The multiple regression analysis found that an increase in type size from 
the smaller size to the larger size led to an improvement of reading performance of 
16%; an increase in the type weight from medium to bold led to an improvement 
of 9%; changing the typeface from serif to sans serif led to an improvement of 4%; 
and variations in spacing did not lead to a significant change in reading performance 
(p. 50). Shaw noted that her findings contradicted “traditionalist views that a serif 
face is always more legible than a sans serif’ (p. 61), and she concluded that the 
effect of differences between the two typefaces was of minor importance compared 
with the effects of size and weight (p. 65). 

Shaw compared various subgroups in the sample of adults with visual impairment. 
The most common causes of visual impairment (some adults had more than one 
cause) were: 


e Cataract. This condition involves progressive cloudiness in the lens of the eye. 

e Glaucoma. This condition involves damage to the retina or optic nerve. 

e Macular degeneration (often known as age-related macular degeneration). This 
causes impaired vision in the centre of the visual field. 

e Myopia (short-sightedness). In its severe form, this may be due to glaucoma or 
retinal detachments. 


Comparing these subgroups, changing the typeface from serif to sans serif had 
significantly benefited participants with macular degeneration but not the other three 
subgroups (p. 50). Independent of this, 72 exhibited a congenital impairment, having 
been affected since birth or early childhood, whereas 112 exhibited an acquired 
impairment, having been affected since the age of 50 (p. 43). Changing the typeface 
from serif to sans serif had benefited the latter but not the former. 

Shaw also recruited 48 children with visual impairment who had a reading age of 
at least 11 from schools for partially sighted children. Once again, Shaw compared 
their characteristics with those of the national population, and she concluded that 
they were representative of older, brighter children in schools for partially sighted 
children (p. 56). The multiple regression analysis found that an increase in type 
weight from medium to bold led to an improvement of reading performance of 7%, 
but that variations in type size, typeface, or spacing did not lead to a significant change 
in reading performance (p. 50). The most common causes of visual impairment were 
myopia, cataract, and nystagmus (which causes visual impairment by disrupting the 
normal pattern of eye movements), but these subgroups were too small for any further 
statistical analysis. After they had read the passages, the children were asked which 
of the four examples they thought was the clearest. There was no correlation between 
the children’s personal subjective judgement and their objective reading performance 
(p. 49). 
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Shaw concluded that the sans serif typeface was slightly more legible than the 
serif typeface for her adult readers, but that there was no measurable difference for 
the children (p. 65). This might be taken to suggest that the advantage of the sans 
serif typeface was due to the adults’ increased familiarity and experience in reading 
such typefaces. However, there are two problems with Shaw’s study. First, although 
she carried out a large number of statistical tests on her data, she did not control 
for the possibility of Type I errors. As a result, some of the statistically significant 
differences which she found might have been spurious results due to chance variation. 

Second, and more fundamentally, Shaw’s focus on readers with visual impair- 
ment meant that she ignored the performance of the normal readers in her study. In 
normalising the performance of her participants on each passage against the perfor- 
mance of normal readers on the same passage, she ignored the possibility that the 
normal readers themselves might have shown significant differences between the serif 
and sans serif typefaces. Suppose that the readers with visual impairment produced 
reading speeds of 50 words/min on a passage printed in Plantin and 60 words/min on 
a passage printed in Gill Sans, and that the normal readers produced reading speeds 
of 100 words/min and 120 words/min on these passages. Normalising the former 
data would have yielded scores of 0.5 for both typefaces. (Taking logarithms and 
deviations about the mean value would not have changed this situation.) In other 
words, the normalisation process might well have obscured differences in legibility 
in both normal readers and those with visual impairment. Unfortunately, Shaw did 
not present the descriptive statistics to enable readers to establish whether or not this 
was a possibility. 


8.3 Children in Special Education 


Pittman (1976) compared 48 children with learning disabilities and 48 nondisabled 
children in their reading comprehension. She showed them stories consisting of five 
paragraphs and then administered ten sentence-completion questions. Each story 
was presented over four trials. The children in each group received stories in one of 
four typefaces: one was in a serif typeface (Pica), two were in sans serif typefaces 
(Gothic and Primary), and one was in a cursive typeface (Script). Not surprisingly, 
the children’s performance increased over the four trials. The nondisabled children 
obtained higher scores than did the children with learning disabilities. However, 
there was no significant difference among the children’ scores on the four typefaces. 
Pittman concluded “that the style of type is not an important variable in the reading 
comprehension of LD [learning-disabled] and normal children” (p. 115). It might 
be noted that the results obtained by the comparison group of nondisabled children 
confirm the conclusion of Chap. 7 that there is no difference in the legibility of serif 
and sans serif typefaces among normal young readers. 

Section 7.4 mentioned a study by Sassoon (1993) in which nondisabled children 
were shown a short passage in four different typefaces and were asked to choose 
the typeface that they liked best. Sassoon repeated this study with 50 children who 
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were in special education and aged between 8 and 13. Apart from describing them as 
having “learning difficulties” (p. 158), she did not mention what their special needs 
actually were. In this case, there were clear differences: 44% chose the slanting sans 
serif typeface, 28% chose the serif Times Italic, 18% chose the sans serif Helvetica, 
and 10% chose the serif Times Roman. Sassoon attributed the low preference for 
Times Roman to the fact that its pronounced serifs and short descenders affected 
the identification of certain letters. However, in Sect. 7.4 it was noted that Sassoon 
had developed the slanting sans serif typeface herself and had been promoting it 
for use in material for young readers. It would have been better if Sassoon had 
employed assistants who were blind as to the specific research hypotheses to avoid 
the possibility of researcher bias. 

Haugen (2010) tested 14 children in special education in Grades 3—6 (aged 8-11). 
Once again, apart from describing them as having “mild special education needs” 
(p. 17), she did not discuss what their special needs actually were. She asked them to 
read aloud four passages of between 260 and 440 words printed in different typefaces: 
Bookman, a serif typeface; Comic Sans, a sans serif typeface based on comic-book 
lettering; Helvetica, a more regular sans serif typeface; and Times, another serif 
typeface. Each passage took about 5—10 min to read. However, Haugen did not time 
the children exactly but instead monitored their behaviour. Finally, she showed them 
all four of the passages that they had read and asked them to say which style of letters 
they had found the easiest to read and which they liked the best (pp. 83-93). 

All the children completed all four passages, but they appeared more restless when 
reading the passages in Comic Sans and Times than when reading the passages in 
Bookman and Helvetica. Haugen commented that the issue was not the difference 
between serif and sans serif typefaces but the design of each typeface when compared 
with that of the others. The number of words read incorrectly was similar across the 
four typefaces, although Times yielded the fewest while Comic Sans yielded the most. 
The children were more likely to skip words printed in the two serif typefaces than 
those printed in the two sans serif typefaces, whereas they were more likely to pause 
when reading words printed in the two sans serif typefaces than when reading words 
printed in the two serif typefaces. Finally, they were more likely to run together two 
successive sentences printed in Comic Sans than those printed in the other typefaces 
(pp. 121-128). 

Nine of the 14 participants thought that one of the serif typefaces was the easiest 
to read, and five thought that one of the sans serif typefaces was the easiest to read, 
four of whom chose Comic Sans. However, their choice did not seem to bear any 
relationship to their actual reading performance. Moreover, 12 of the 14 participants 
chose a sans serif typeface as the one they liked the best (of whom eight chose 
Comic Sans), one chose a serif typeface, and one insisted on choosing a serif typeface 
and a sans serif typeface (pp. 129-139). Haugen argued that their preferences had 
been influenced by their prior experience of sans serif typefaces on game systems, 
computers, cellular phones, and other electronic devices (pp. 164—165). 
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8.4 Readers with Congenital Visual Impairment 


Uysal and Diiger (2012) evaluated the effects of a 3-month training programme in 35 
children with visual impairment at a Turkish primary school. Their preferences for 
different typefaces were assessed before and after the programme by showing them a 
20-word sentence in Turkish in five different typefaces (Arial, Comic Sans, Tahoma, 
Times New Roman, and Verdana) and ten different type sizes. Their reading speed 
was averaged across all the typefaces and sizes. Before the programme, they preferred 
the sans serif typefaces such as Verdana (11 children) or Arial (8 children) as opposed 
to the serif typeface Times New Roman (3 children). The programme initially adopted 
their preferred typeface but gradually incorporated the other typefaces. It led to a 
significant increase in both their reading speed and their writing speed, but not in the 
judged legibility of their handwriting. After the programme, most children indicated 
that they were comfortable with any of the typefaces except for the sans serif Tahoma. 

Skilton et al. (2018) conducted a focus group that involved eight people with 
deaf-blindness. Such individuals are born with a hearing loss and develop visual 
impairment during their early childhood. The aim of the focus group was to identify 
the participants’ accessibility needs for their involvement in future research. Their 
recommendations included the provision of printed materials in a large size (18-point 
or higher) and in a sans serif typeface such as Arial. However, Skilton et al. acknowl- 
edged that these were among the recommendations that were typically provided for 
improving the accessibility of information of deaf-blind people. Their account left 
it unclear whether these recommendations were based on their own negative expe- 
riences with serif typefaces or whether they were just repeating back conventional 
attitudes that they had previously acquired from figures in authority. 


8.5 Readers with Acquired Visual Impairment 


Prince (1967) suggested that the impact of typographical variables might be different 
in older people with acquired visual disorders. As mentioned in Sect. 8.2, Shaw 
(1969) had found an advantage for a sans serif typeface over a serif typeface for those 
with acquired visual impairment but not for those with congenital visual impairment. 

Estey et al. (1990) showed 52 patients with an average age of 69.4 years admitted 
for cataract surgery a page of text printed in a 12-point sans serif typeface, Univers 
Medium. Only 65% of the patients said that they could read the text, while 35% found 
it blurry. Estey et al. then showed the patients two pages of text: One was printed 
in 14-point Univers Medium, and the other was printed in a 14-point serif typeface, 
Century Schoolbook. Only 2% of the patients said that they could not read these 
texts. Of the 52 patients, 65% said that they preferred Univers Medium, 33% said 
that they preferred Century Schoolbook, while 2% had no preference. Estey et al. 
argued that material for patients with visual deficits should be printed in 14-point 
sans serif typefaces. 
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Campbell et al. (2006) carried out two studies in people with age-related macular 
degeneration (AMD). Participants were recruited from the members of the Cana- 
dian National Institute for the Blind (now the CNIB Foundation). They were asked 
to compare samples of text printed in six different typefaces using reading aids if 
necessary. Two were serif typefaces (Times Roman and Lucida), while four were sans 
serif typefaces (Adsans, Arial, Clearview, and Verdana). Adsans had been devised in 
1959 to be used in a small (4.75 point) size in newspaper classified advertisements. 
It is not available in common computer applications, but it was used as the basis for 
Verdana. Clearview was devised in the 2000s for use on road signs. In both studies, 
the participants rated how easy it was to read each typeface on a 7-point scale from 
“impossible to read” to “very easy to read.” In the first study, they also ranked the 
samples from the easiest to the hardest to read. However, this proved to be rather 
demanding, and so ranking was not used in the second study. 

In the first study, 241 participants aged 50 or older were shown excerpts from 
Robert Louis Stevenson’s novel, Treasure Island, all printed in 16-point type. The 
passage printed in Adsans was given significantly higher ratings than any of the other 
samples, and it was ranked the easiest to read by more than 50% of the participants. 
In the second study, 157 participants were shown information leaflets for over-the- 
counter medicines amended to refer to unfamiliar products and printed in 7-point 
type. The leaflet that was printed in Adsans was once again given significantly higher 
ratings than any of the other samples. Campbell et al. remarked that in both studies 
Times Roman had been given low ratings despite being a relatively familiar typeface, 
which suggested that the participants were not simply rating the typefaces on the basis 
of their familiarity. These findings show that people with AMD have a preference 
for Adsans, but they do not constitute evidence with regard to its objective legibility. 

Rubin et al. (2006) asked 43 patients with mild cataract or glaucoma to read texts 
printed in four different typefaces. One was Tiresias, a sans serif typeface developed 
for people with impaired vision by the Royal National Institute of Blind People in the 
United Kingdom. This was compared with the serif typeface, Times New Roman, 
and two other sans serif typefaces, Foundry Form Sans and Helvetica. Their reading 
speed was found to be significantly faster with Tiresias than with the other three 
typefaces. However, although nominally of the same point size, the four typefaces 
occupied different amounts of horizontal and vertical space. When this factor was 
statistically controlled, the advantage of Tiresias disappeared. Rubin et al. concluded 
that variations in typeface had little influence on the reading speed of people with 
mild to moderate sight problems. 

Tarita-Nistor et al. (2013) tested 24 patients with AMD using reading charts 
printed in Times New Roman, Courier, Arial, and a version of Andale Mono. These 
required them to read individual sentences presented at progressively smaller sizes. 
Times New Roman is a serif typeface, and Courier is a slab serif typeface, whereas 
Arial and Andale Mono are both sans serif typefaces. Times New Roman and Arial are 
both proportionally spaced, whereas Courier and Andale Mono are both monospaced. 
Tarita-Nistor et al. measured three aspects of their participants’ performance: their 
reading acuity, which was the smallest print size that could be read without significant 
errors; the maximum reading speed, which was the highest speed at which text could 
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be read without regard to print size; and the critical print size, which was the smallest 
print size that could be read with maximum speed. There was no significant variation 
among the typefaces in either critical print size or maximum reading speed, but there 
was significant variation in reading acuity: surprisingly, and—contrary to the results 
of the study by Smither and Braun (1994) that was mentioned in Sect. 7.6—1text 
printed in Courier yielded significantly better reading acuity than text printed in the 
other three typefaces, but text printed in Arial yielded significantly worse reading 
acuity than text printed in the other three typefaces. 

Hedlich et al. (2018) administered reading charts containing sentences printed in 
the slab serif typeface Courier New or the sans serif typeface Arial to 16 patients 
with visual impairment before or after cataract surgery. They found no significant 
difference between the two typefaces in their reading acuity, in their critical print 
size, or in their maximum reading speed. One limitation of this study, apart from the 
small sample size, was that the reading chart printed in Arial was always presented 
before the reading chart printed in Courier New, so that the researchers had no control 
over the effects of fatigue or practice. They also asked the participants about their 
preference between the two typefaces: eight of the participants preferred Arial, four 
preferred Courier New, and four had no preference. 

Nersveen et al. (2018) carried out a postal survey of adults with a wide variety 
of visual impairments. They identified ten typefaces for consideration. Seven were 
printed in regular font, including two serif typefaces (Scala and Times Roman) and 
five sans serif typefaces (Frutiger, Helvetica, Scala Sans, Tiresias, and Verdana). 
Three were printed in bold font, including one serif typeface (Scala Bold) and two 
sans serif typefaces (Scala Sans Bold and Tiresias Bold). The participants were a 
random sample of 5,000 members of the Norwegian Association of the Blind and 
Partially Sighted. Each typeface was presented in five different body sizes (8, 10, 12, 
14, and 16 points, but scaled so that their x-heights matched those of Times Roman), 
and in ten variations in contrast (from black type on white background to white type 
on black background), yielding 500 conditions. Each was presented as three or more 
lines of text, together with a 4-point rating scale in which the response categories 
were “Easily readable”, “Readable with some difficulty”, “Difficult to read”, and 
“Unreadable”. This yielded a booklet consisting of 50 printed pages. 

The participants were instructed to carry out the task only if they were partially 
sighted and able to read printed text (with a magnifying glass or with supplementary 
lights ifnecessary). Completed booklets were returned by 830 participants. Repeated- 
measures tests were employed to compare the ratings given to the serif typefaces and 
the sans serif typefaces. For typefaces at 12 and 14 points, the difference was not 
statistically significant. For those at 8, 10, and 16 points, sans serif typefaces received 
significantly higher ratings than did serif typefaces. Nevertheless, the differences 
were small in magnitude and only achieved significance because of the very large 
sample size. A similar pattern emerged when comparing the ratings given to the Scala 
typefaces and the Scala Sans typefaces (J. Nersveen, personal communication, June 
22, 2020). 
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Although Nersveen et al. described their experiment as a study of the legibility of 
printed text, their data actually consisted of the participants’ ratings of the subjective 
acceptability of the different typefaces rather than any measure of their objective 
legibility. The apparent preference for sans serif typefaces is thus consistent with the 
findings of Estey et al. (1990) and Campbell et al. (2006), although the effect was 
far less pronounced. One problem is the response rate of only 16.6%. This might be 
partly explained by the fact that the participants did not receive any personal reward 
for carrying out the task, although two respondents chosen at random were given a 
“prize” of a Digital Audio Broadcasting radio worth 1,000 Norwegian kroner. As a 
result, most of the participants may not have been willing to devote time and effort to 
a rather burdensome task. Whatever the cause, it does suggest that the study suffered 
from sampling bias, in that the respondents might not have been representative of 
the target population. 


8.6 Readers with Aphasia 


The term aphasia covers a wide variety of disorders of spoken language, but a 
majority of people with aphasia also exhibit impairment of reading (Brookshire et al., 
2014). Wilson and Read (2016) tested nine participants who had been diagnosed with 
mild-to-moderate aphasia as the result of cerebrovascular accidents. They were given 
a standardised test of reading comprehension that consisted of 35 short paragraphs. 
In each case, the participants had to choose one of four alternative words or phrases 
to complete the final sentence. For each participant, the paragraphs were randomly 
assigned to one of seven conditions. One involved the presentation of the original 
paragraph in a serif typeface (Times New Roman), and the other six involved different 
manipulations. For two manipulations, the typeface was amended either to a sans serif 
typeface (Verdana) or to an ornate cursive typeface (Harrington). The participants 
achieved significantly higher scores with the sans serif typeface than with either 
of the other two typefaces. Wilson and Read did not speculate as to why patients 
with aphasia might find serif typefaces less legible. Some researchers have found 
that people with aphasia prefer material printed in a sans serif typeface (Rose et al., 
2011), but other researchers have not (Haw, 2017, p. 129; Herbert et al., 2019). 


8.7 Readers with Dyslexia 


The term dyslexia refers to a specific disorder of reading that may result from a 
wide variety of causes (and may be either congenital or acquired). For many years, 
the British Dyslexia Association (2018) has recommended that documents printed 
for people with dyslexia should use sans serif typefaces, and this has been taken 
to encompass teaching materials for students (Shaw & Anderson, 2017). However, 
the Association did not cite any evidence to support this recommendation. In fact, 
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relevant evidence has been obtained in attempting to evaluate Dyslexie, a typeface 
that was developed by Christian Boer in 2008 to try to facilitate reading among 
children and adults with dyslexia (http://www.dyslexiefont.com). It is a sans serif 
typeface characterised by a relatively large x-height and relatively wide vertical and 
horizontal spacing between the letters. It is available under licence for both Microsoft 
Windows and Apple computer systems. 

Marinus et al. (2016) recruited 39 children who were undergoing remediation for 
low progress in reading. They were asked to read aloud four passages, each of 200 
words. One passage was presented in a 14-point Dyslexie typeface. A second was 
presented in 16-point Arial, a sans serif typeface which matched the x-height of the 
letters in the first passage. A third passage was presented in 16-point Arial with an 
overall increase in spacing. The fourth passage was presented in 16-point Arial with 
increased spacing both between words and between letters within words to match 
the spacing used in the first passage. The order of the conditions and the assignment 
of the passages to the conditions was counterbalanced across different participants. 
The children were scored on the number of words that they had read correctly per 
minute in each of the four conditions. Marinus et al. found that their reading speed 
in the first condition was significantly faster than in either the second or third, but 
that it was not significantly different in the fourth condition. 

Kuster et al. (2018) carried out two experiments to evaluate the Dyslexie type- 
face. In the first experiment, 170 children with dyslexia were asked to read aloud 
two passages at separate sessions. One passage was presented in 12-point Dyslexie; 
the other was presented in 13-point Arial, adjusted to match the vertical spacing of 
Dyslexie. The order of the two conditions was counterbalanced across the partici- 
pants. The children read significantly more quickly and made fewer errors on the 
second passage than on the first. However, there was no significant difference in 
either reading time or the number of errors between the typefaces. 

In their second experiment, Kuster et al. tested 102 children with dyslexia and 45 
children without dyslexia. They were each asked to read aloud three lists of words of 
varying complexity at three separate sessions. At each session, one list was presented 
in Dyslexie, whereas the other two lists were presented in the serif typeface Times 
New Roman and the sans serif typeface Arial, in both cases adjusted to match the 
x-height and the vertical spacing of Dyslexie. The order of the three conditions and 
the assignment of word lists to conditions was counterbalanced, and the children 
were scored on the number of words that they read correctly in one minute. Not 
surprisingly, the children without dyslexia obtained higher scores than the children 
with dyslexia, and performance varied inversely with the complexity of the words. 
However, there was no significant difference in the performance of either the children 
with dyslexia or the children without dyslexia across the three typefaces. 

Finally, Powell and Trice (2020) recruited 36 children with dyslexia. They were 
each asked to read aloud three stories; each story contained 200 words and was 
followed by three factual questions to test the children’s comprehension. One story 
was presented in 12-point Dyslexie; the others were presented in 14-point Arial 
and 14-point Times New Roman, both adjusted to match the horizontal and vertical 
spacing of Dyslexie. The order of the three stories and the assignment of the stories 
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to the three typefaces was counterbalanced across different children. There was no 
significant variation across the three typefaces in terms of the mean time that the 
children took to read the stories, no significant variation in terms of the number of 
errors that they made, and no significant variation in their comprehension scores. 

All three of these studies indicated that the effectiveness of the Dyslexie type- 
face is due to its increased spacing and not to its different letter shapes. If other 
typefaces are adjusted to match the spacing used for the Dyslexie typeface, the 
reading performance of children with dyslexia does not differ across the different 
typefaces. However, both Kuster et al.’s (2018) second experiment and Powell and 
Trice’s (2020) study show in addition that the reading performance of children with 
dyslexia does not differ between serif and sans serif typefaces if they are matched 
in terms of their spacing. This contradicts the recommendation made by the British 
Dyslexia Association (2018) that documents printed for people with dyslexia should 
use sans serif typefaces. 


8.8 Conclusions 


Any differences in the legibility of serif and sans serif typefaces might become more 
apparent in readers whose visual systems are challenged as the result of disablement. 
In fact, the modal finding is that there are no differences in the reading capability of 
readers with a variety of disabilities when they are presented with material printed 
in serif and sans serif typefaces. It might be thought that children with congenital 
visual impairment would be more sensitive to typographical factors, but in fact such 
children rapidly adapt to reading both serif and sans serif typefaces. It has been 
suggested that the effects of acquired visual impairment might be different from 
the effects of congenital visual impairment, but both groups appear to be equally 
proficient in reading serif and sans serif typefaces. A majority of people with aphasia 
exhibit impairment of reading. In this field, it is often taken for granted that people 
with aphasia will find sans serif typefaces more legible, but there is only one study 
with a very small sample of participants that supports this position. Certainly, there 
is now good evidence that the reading performance of children with dyslexia does 
not differ between serif and sans serif typefaces when they are matched in terms of 
their spacing. 
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Chapter 9 A) 
General Conclusions to Part I E 


9.1 Key Findings from Part I 


As was mentioned in Sect. 1.4, Part I has reviewed diverse studies using diverse 
methods of data collection, and this precludes any formal meta-analysis to inte- 
grate the findings. One must instead focus on the most common finding—the modal 
finding—regarding the legibility of serif and sans serif typefaces: superiority of serif 
typefaces; superiority of sans serif typefaces; or no difference. 

The research question is whether there are differences in the legibility of serif 
and sans serif typefaces when they are used to generate printed material. The modal 
finding from the research studies that have been reviewed is that there are not. This 
applies to four out of six experiments on reading letters and words (Sects. 4.1 and 
4.3); the two Korean studies yielded contradictory results. It applies to five out of six 
experiments on reading sentences (Sect. 5.1; see also Sects. 6.1 and 6.2) and to all 
four experiments on the comprehension of text, whether using measures of speed or 
accuracy (Sect. 5.2). It also applies to all eight experiments on the reading capability 
of younger readers and to two of the three experiments on the reading capability 
of older readers (Chap. 7). Two studies have been cited in support of the supposed 
superiority of serif typefaces, but these can be discounted: one failed to report any 
empirical data on the issue (Burt, 1959; Burt et al., 1955), and the other suffered 
from irredeemable methodological problems (Wheildon, 1990, 2005). 

It is unfortunate that there has been relatively little work on the legibility of serif 
and sans serif typefaces in readers with disabilities as opposed to their subjective 
preferences for different kinds of typeface (Chap. 8). Two studies found no difference 
in legibility between serif and sans serif typefaces (Pittman, 1976; Rubin et al., 2006). 
One found a superiority for serif typefaces among children with congenital visual 
impairment (Nolan, 1959), but this study seems to have suffered from methodological 
problems. A fourth study found a superiority for sans serif typefaces among patients 
with aphasia (Wilson & Read, 2016). It is generally assumed that sans serif typefaces 
are more appropriate for people with aphasia, and there is an urgent need for more 
research to evaluate this assumption. Finally, two studies have found that the reading 
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performance of children with dyslexia does not differ between serif and sans serif 
typefaces when they are matched in terms of the spacing of the letters. 


9.2 Preferences and Connotations 


With regard to research studies concerned with readers’ preferences between serif 
and sans serif typefaces and the connotations of the two kinds of typeface, the modal 
finding is that among adult readers there is no overall preference between serif and 
sans serif typefaces, nor any overall difference in the connotations of serif and sans 
serif typefaces (Sect. 5.4). Even so, there is a suggestion that readers’ preferences 
and the connotations of serif and sans serif typefaces may vary between different 
contexts (see Schriver, 1997, pp. 289-303; Zachrisson, 1965, pp. 156-62). This has 
major implications for educational publishing and educational assessment: 


e For authors, editors, and publishers of books in many fields, any such differences 
will be mainly of commercial relevance. However, authors and editors of academic 
articles (and of books, too, in the humanities) will want to be assured that their work 
is evaluated in terms of its content rather in terms of its typographical appearance. 
This provides a far more logical reason for requiring that manuscripts should 
be submitted for publication to academic journals and publishers in a standard 
typeface than simply asserting that one kind of typeface is more legible than 
another. 

e The issue of fairness is especially relevant in the context of academic assessment. 
It is possible that teachers and other assessors will give more positive evaluations 
of students’ assignments if the teachers and students share the same typographical 
preferences than if they differ in those preferences (although there seems to be 
no empirical evidence on this matter). It would be useful if teachers who are 
responsible for particular course units (and, ideally, for entire degree programmes) 
could agree on their typographical preferences and make these known to their 
students. 


There is a need for research on whether reviewers’ evaluations of academic 
manuscripts and teachers’ evaluations of students’ assignments are affected by their 
own preferences and expectations. Nevertheless, the available evidence suggests that 
these variations in readers’ expectations and preferences depend on their prior expe- 
rience and familiarity with different typefaces and not on any intrinsic properties of 
the typefaces themselves. Indeed, the results that were obtained by Uysal and Diiger 
(2012) indicate that even readers who are visually impaired will find most typefaces 
relatively congenial after a gradual period of exposure. 
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9.3 Implications for Previous Assumptions 


Where does this leave previous assumptions about the legibility of serif and sans 
serif typefaces? There is no support for traditional beliefs that serif typefaces are 
superior to sans serif typefaces and certainly no support for Morison’s (1959) asser- 
tion that “the serif is essential to the reading of alphabetical composition” (p. xi). 
Regarding the American Psychological Association’s (2010) insistence that a serif 
typeface “improves readability and reduces eye fatigue” (pp. 228—229), Perea (2013) 
remarked: “There are no well-founded theoretical reasons to use of [sic] a serif font 
over a sans serif font—beyond subjective preferences” (p. 16). To this one might 
add: and there is no convincing empirical support, either. 

The assertion contained in Merriam-Webster’s Manual for Writers and Editors 
(1998) that “studies of typeface legibility have tended to demonstrate that standard 
serif typefaces can be read somewhat more easily and quickly than standard sans- 
serif typefaces” (p. 330) is factually incorrect. Finally, the length of the reference 
list at the end of this book contradicts Kullmann’s (2015) statement that there have 
only been “sporadic” studies on this issue (p. 1), and one can certainly dismiss his 
assertion that previous research has not led to any clear conclusion. On the contrary, 
based on the wealth of evidence that has accumulated over the last 140 years, the 
clear conclusion is that there is no difference in the legibility of serif typefaces and 
sans serif typefaces when they are used to produce printed material. 


9.4 The American Psychological Association’s Current 
Position 


The guidelines in the sixth edition of the Association’s Publication Manual followed 
those in previous editions. However, a seventh edition was published while this 
monograph was being written; the new guidelines have already been adopted by the 
American Educational Research Association and are likely to be adopted by other 
organisations in the future. This seventh edition takes a rather different approach 
(American Psychological Association, 2020): 


APA [American Psychological Association] Style papers should be written in a font that is 
accessible to all users. Historically, sans serif fonts have been preferred for online works and 
serif fonts for print works; however, modern screen resolutions can typically accommodate 
either type of font, and people who use assistive technologies can adjust font settings to their 
preferences. Thus, a variety of font choices are permitted in APA style... . 


Use the same font throughout the text of the paper. Options include 


e asans serif font such as 1 1-point Calibri, 1 1-point Arial, or 10-point Lucida Sans Unicode 
or 


e a serif font such as 12-point Times New Roman, | 1-point Georgia, or normal (10-point) 
Computer Modern.... 
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We recommend these fonts because they are legible and widely available and because they 
include special characters such as math symbols and Greek letters. (p. 44) 


An accompanying background paper confirms that the focus of the new guidelines 
is on the accessibility of typefaces for users with disabilities rather than on their 
legibility per se (Accessibility, 2020). The paper also refers to the Web Content and 
Accessibility Guidelines produced by the World Wide Web Consortium, suggesting 
that itis concerned with reading from screens rather than reading from paper, although 
this is not made explicit. With regard to the legibility of serif and sans serif typefaces, 
the paper makes the following statement: 


It is a common misconception that serif fonts (e.g., Times New Roman) should be avoided 
because they are hard to read and that sans serif fonts (e.g., Calibri or Arial) are preferred. 
Historically, sans serif fonts have been preferred for online works and serif fonts for print 
works; however, modern screen resolutions can typically accommodate either type of font, 
and people who use assistive technologies can adjust font settings to their preferences. 


Research supports the use of various fonts for different contexts. For example, there are 
studies that demonstrate how serif fonts are actually superior to sans serif in many long texts 
(Arditi & Cho, 2005; Tinker, 1963). And there are studies that support sans serif typefaces 
as superior for people living with certain disabilities (such as certain visual challenges and 
those who learn differently; Russell-Minda et al., 2007). (“Myth 1,” paras. 1-2) 


The choice of reference citations in this statement is rather odd. First, in discussing 
different styles of typeface, Tinker (1963, pp. 46—48) referred to the study by Paterson 
and Tinker (1932), who used a speed-of-reading test to measure the legibility of each 
of ten typefaces. Seven were serif typefaces that had been nominated by a large 
number of editors and publishers as being worthy of study, of which Scotch Roman 
was used as a benchmark. The results showed that “type faces in common use do not 
differ significantly” (Tinker, 1963, p. 48). The other three were chosen in order to be 
“radically different” (p. 46): Kabel Light, a sans serif typeface; American Typewriter, 
a slab serif typeface that imitated typewriting; and Cloister Black, an elaborate serif 
typeface. Both American Typewriter and Cloister Black were read significantly more 
slowly than Scotch Roman, but Kabel Light was not. Tinker concluded: “Type faces 
in common use are equally legible.... A serifless type, Kabel Light, is read as rapidly 
as ordinary type” (p. 64). In other words, Tinker (1963) did not show that serif 
typefaces were superior to sans serif typefaces. Indeed, in a subsequent annotated 
bibliography, Tinker (1966, p. 84) strengthened his conclusion in the light of research 
findings since the study by Paterson and Tinker (1932): “Typefaces in common use 
are equally legible. This includes the typefaces with serifs and those without serifs.” 

Second, in addition to the study by Arditi and Cho (2005) that was mentioned in 
Sect. 5.1 and involved the presentation of “scrambled” text, these researchers carried 
out an experiment where they asked just four participants to read aloud individual 
sentences. The sentences were presented one word at a time on a computer screen. 
Arditi and Cho found no difference in performance between sentences in a slab serif 
typeface and sentences in a sans serif typeface. It should be noted that they did not 
make use of “long texts”. However, the main point is that, once again, Arditi and 
Cho did not show that serif typefaces were superior to sans serif typefaces. 
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Third, the review by Russell-Minda et al. (2007) on the legibility of typefaces 
for readers with visual impairment covered both research on reading from paper and 
research on reading from screens. They did indeed conclude: “Sans serif typefaces, 
such as Arial, Helvetica, Verdana, or Adsans, are more readable than is Times New 
Roman, for example” (p. 413). However, this was not supported by the evidence 
that they described: they cited eight studies, of which six had found no significant 
difference in legibility between serif and sans serif typefaces. In fact, their abstract 
stated, “Research has not produced consistent findings” (p. 402). Moreover, in the 
original report on which their published review was based, Russell-Minda et al. 
(2006) had arrived at a very different conclusion: “Based on results from existing 
studies, the effects of the presence or absence of serifs on text legibility seem to be 
inconclusive” (p. 23). In short, they definitely did not demonstrate that sans serif 
typefaces were superior for people living with certain disabilities. 

In other words, each of these three reference citations is in error because it fails to 
support the statement to which it is attached. In the bibliographic research literature, 
these are referred to as “quotation errors”, although they include indirect quotations, 
paraphrases, and summaries as well as direct quotations. Mertens and Baethge (2011) 
demonstrated that around 20% of reference citations in the medical and bioscience 
literature were quotation errors, but it is clearly unfortunate that such errors should 
occur in a document published by the American Psychological Association. 


9.5 Conclusions 


This chapter concludes Part I by summarising and discussing the key findings. Are 
there any differences in the legibility of serif and sans serif typefaces when they are 
used to generate printed material? The modal finding from the research studies that 
have been reviewed is that there are not. Two studies in particular have been cited in 
support of the superiority of serif typefaces, but these can be discounted on scientific 
grounds. Are there any differences in readers’ preferences and connotations between 
serif and sans serif typefaces when they are used to generate printed material? The 
modal finding is that there is no overall preference between serif and sans serif type- 
faces, nor any overall difference in their connotations. Even so, there is a suggestion 
that readers’ preferences and the connotations of serif and sans serif typefaces may 
vary between different contexts, and the chapter discussed the implications of this 
for educational publishing and educational assessment. 

The chapter considers the relevance of the findings for previously stated assump- 
tions about the legibility of serif and sans serif typefaces. The traditional view that 
“everybody knows” that serif typefaces are easier to read on paper than sans serif 
typefaces is clearly untenable, since this view has never been supported by sound 
empirical evidence. Finally, the chapter concludes by assessing the position that is 
adopted in the seventh (2020) edition of the American Psychological Association’s 
Publication Manual. The position confounds research on reading from paper with 
research on reading from computer screens, and the background paper on which it 
depends suffers from several quotation errors (that is, reference citations that do not 
support the statements to which they are attached). 
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Part II 
Reading from Screens 


Chapter 10 A) 
“Everybody Knows”: Reading get 
from Screens 


10.1 Introduction 


The last 60 years have seen fundamental changes in the way that people acquire and 
share information. At the beginning of this period, information was mainly repre- 
sented in the form of written text that was either read directly by readers themselves 
or else communicated by teachers or other speakers. Nowadays, much information 
is available on computer monitors or other screens. (One should, of course, recog- 
nise the “digital divide” both between people in the developed world and people in 
the developing world and also among diverse groups within the developed world in 
terms of their access to the relevant technologies.) The availability of this technology 
is particularly apparent in educational settings, where students routinely expect to 
acquire information through computer systems and also use such systems to submit 
their academic assignments to be evaluated by their teachers and other assessors. 
Users of screen-based material have a choice in how they make use of that material. 
They can either read it in the form in which it is presented, or they can print it in 
hard copy. Shaikh (2004; Shaikh & Chaparro, 2004) carried out an online survey 
concerning reading habits and obtained responses from 330 participants: 120 were 
students, and the rest were recruited from a variety of e-mail lists. Most respondents 
were happy to read news items, newsletters, and product information online, but they 
preferred to scan articles in journals online before printing them off to read in more 
detail. This was consistent with the findings of previous studies that researchers and 
other professionals tended to print off articles or other important documents to read. 
It used to be popular to assume that the increased use of digital technologies 
among young adults meant that they constituted a distinct population who thought 
and learned in qualitatively different ways from older people. They were variously 
called “Millenials” (Strauss & Howe, 1991), the “Net Generation” (Tapscott, 1998), 
“Digital Natives” (Prensky, 2001), and “Generation Y” (Jorgensen, 2003). However, 
these ideas were not supported by research evidence (see, e.g., Pedró, 2009). Surveys 
found quantitative differences between older and younger people in their attitudes to 
technology and their uses of technology, but there was no evidence for any qualitative 
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differences in people born since the early 1980s (e.g., Jelfs & Richardson, 2013). In 
fact, many of today’s students retain a strong preference for print (Baron et al., 2017; 
Mizrachi, 2015), although some do not (Singer & Alexander, 2017). (This may well 
depend on mundane factors such as local printing costs.) 

Which typefaces should be used in the presentation of text on computer screens? 
One approach was to employ versions of typefaces that were already well established 
in conventional printing. The top row of Fig. 10.1 contains two examples of such 
typefaces. As was mentioned in Sect. 1.2, the serif typeface Times New Roman was 
devised in 1932 for use in the London newspaper The Times. Versions became popular 
in printing and publishing more generally, and in the 1980s these were provided by 
Macintosh and Microsoft in their word-processing software. The typeface known as 
“Times Roman” was adopted by Apple. The sans serif typeface Arial was devised in 
1982 for use on IBM printers. It was adopted by Microsoft in 1990, and it was the 
default typeface for several years in some of its applications; it is also available for 
Macintosh users. 

An alternative approach was to devise new typefaces, often with the aim of 
ensuring their legibility on small or low-resolution screens. The bottom row of 
Fig. 10.1 contains two examples of such typefaces. The serif typeface Georgia and 
the sans serif typeface Verdana were both developed for Microsoft in the 1990s 
and were released in 1996. They were subsequently made available for installa- 
tion on Macintosh computers. Other researchers developed entire families of type- 
faces. In the United States, Bigelow and Holmes (1986) devised the Lucida family, 
while in Russia Paratype was devised for material in both Latin and Cyrillic text 
(Akhmadeeva et al., 2012). Other artificial typefaces were devised by Arditi (2004), 
Beier and Larson (2010), and Sanocki (1987), although some of these were so radi- 
cally different from traditional typefaces that they might well have caused difficulties 
even for experienced readers. 

It was mentioned in Sect. 5.1 that actual serif and sans serif typefaces typically 
differ in a number of characteristics. In principle, it should be possible to devise 
artificial typefaces in which serif and sans serif differ only in the presence or absence 
of serifs. In fact, efforts to devise such typefaces disclosed a major confound with the 
width of the letters and the inter-letter spacing. In particular, Arditi and Cho (2005) 
found that adding serifs to Arditi’s (2004) sans serif style led to an increase in the 
mean width of the letters in order to accommodate the serifs. Conversely, Moret- 
Tatay and Perea (2011) found that removing the serifs from the serif style Lucida 


Times New Roman: The quick brown | Arial: The quick brown fox jumps over 
fox jumps over the lazy dog. the lazy dog. 


Georgia: The quick brown fox Verdana: The quick brown fox 
jumps over the lazy dog. jumps over the lazy dog. 


Fig. 10.1 Examples of common serif typefaces (Times New Roman and Georgia) and common 
sans serif typefaces (Arial and Verdana) for displaying text on screens 
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Bright led to an increase in the average inter-letter spacing. This confounding means 
that any results that are obtained using such typefaces are likely to be ambiguous. 

In conventional paper-based printing, individual letters and other symbols are 
discrete physical entities. As was mentioned in Sect. 2.4, the overall height of type- 
faces (their body size) is traditionally expressed in terms of points, where one point 
is approximately equal to 0.35 mm. However, the size of typefaces is also expressed 
in terms of the dimensions of lowercase letters. The x-height of a typeface is the 
height of lowercase letters that do not have either ascenders or descenders (such 
as the letter x itself). When the same characters are presented on computer screens, 
they are simply fragmented digitised representations. Body size and x-height become 
relative terms since they depend on the size in which the characters are displayed 
on-screen. Rendering printed typefaces as digitised letter forms has been an exceed- 
ingly complex process involving complex debates and decisions (Bigelow, 2020a, b), 
although much of this process may well not be apparent to most display-screen users. 

Section 2.5 argued that there was no reason to think that serifs and other features 
had the same consequences when people were reading from paper as when they were 
reading from computer monitors or other screens. In fact, with regard to reading 
from computer screens, designers and design educators have typically claimed that 
the legibility of sans serif typefaces was superior to that of serif typefaces (e.g., 
Poncelet & Proctor, 1993; Schriver, 1997, p. 508; “Universal Design’, 1999, p. 5), 
hence my colleague’s assertion, mentioned in Sect. 1.1, that “everybody knew” that 
sans serif typefaces were easier to read on screens than were serif typefaces. Even 
so, advocates of this position have usually failed to present any empirical evidence 
in support of this claim, and accordingly Part II of this book reviews the research 
literature with regard to the legibility of serif typefaces and sans serif typefaces when 
they are used to generate material on computer monitors and other screens. 


10.2 Legibility of Serif and Sans Serif Typefaces Using 
Older Technology 


Before considering the legibility of typefaces using modern computers, it should be 
acknowledged that the use of technology to enable information to be presented in 
other ways than as material printed on paper is by no means a new phenomenon. From 
the eighteenth century onwards, speakers used epidiascopes (opaque projectors also 
known as episcopes) and “magic lanterns” (using transparent plates) to project images 
of objects and other material onto viewing screens for potentially large audiences. 
However, from the 1950s these were superseded by overhead projectors and slide 
projectors. The widespread adoption of the latter technologies led to the development 
of recommendations for best practice. It was widely asserted, in particular, that sans 
serif styles rather than serif styles should be used for the projection of textual material. 
Nevertheless, as Phillips (1976, pp. 18-19) observed, such assertions seem to have 
been based on personal preferences rather than empirical research. 
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Adams et al. (1965) compared the legibility of different typefaces in the images 
produced by an overhead projector. They tested 120 children in Grades 1, 2, 3, 5, 
and 6 (i.e., aged 6-12) of a university’s laboratory school. The children in each grade 
rotated among five rows of seats at varying distances from the projection screen. 
They were shown groups of four uppercase letters produced on an electric typewriter 
in five different styles and sizes and were asked to list them on a prepared response 
form. For one style, the letters were in a sans serif typeface (Bulletin). For the other 
four styles, they were produced in different sizes in a serif typeface (Elite). One of 
these styles, Elite 6/32 in. (4.76 mm), was the same size as the Bulletin type. Adams 
et al. noted that letters in the Elite 6/32 in. style were significantly more likely to be 
reported correctly than were letters in the Bulletin style by children in four of the 
five grades. Even so, they added that this “may be a phenomenon of the sample and 
might not similarly be observed in a replication of the study” (p. 427). 

Grooters (1972) carried out a similar study in which rows of ten uppercase 
letters were presented to 60 adult participants in four different sizes at four different 
distances by means of a Kodak Carousel slide projector. Instead of actual typefaces, 
he employed the templates that were in use at the time for lettering in technical 
drawing. The participants were required to read the letters aloud as if in an eye 
examination. He found that the sans serif style LeRoy Standard was marginally 
more legible than the slab serif style LeRoy Stymie Medium, but that both of these 
were significantly more legible than the sans serif style LeRoy Condensed Gothic, 
in which the letters were 60% of the width of those in LeRoy Standard. These results 
were found at all distances and in all sizes, except for the closest distance and the 
largest size where performance approached 100% for all three styles (in other words, 
there were ceiling effects). 

Phillips (1976) similarly compared a slab serif style of lettering (Leroy Stymie) 
with a sans serif style (Twentieth Century). Using a Kodak Carousel projector, he 
presented 31 volunteers with six slides, each containing five lines of ten randomly 
ordered uppercase letters in decreasing size. Three slides contained letters printed in 
the slab serif style, and three contained letters printed in the sans serif style, in each 
case using a light, medium, or bold stroke width. Once again, the participants were 
asked to read the letters in each slide aloud, line by line. Overall, performance was 
better with the slab serif style than with the sans serif style (pp. 53-54). There was 
however a significant interaction between the effects of letter style and letter size, such 
that the difference between the two styles was only significant using one of the smaller 
letter sizes (pp. 56-57). There was also a significant three-way interaction with stroke 
width, such that the difference between the two styles was only significant for two 
of the 15 combinations of size and stroke width (pp. 59-60). Phillips concluded 
that performance was so poor with the smaller letter sizes that neither style would 
be acceptable for projected visual materials; but that conversely performance was 
so good using either style with the larger letter sizes that both serif and sans serif 
lettering would be acceptable for projected visual materials (p. 70). 

Woods et al. (2005) showed pairs of lowercase letters to groups of children from 
kindergarten to fourth grade using a tachistoscope attached to a slide projector. The 
children had to say whether the letters in each pair were the same or different and to 
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write them down. Each pair was presented in a serif typeface (Times New Roman) or 
in a sans serif typeface (Arial), and the two typefaces were presented either in separate 
blocks of slides or in the same block. The children’s scores on both discrimination 
and identification were higher for letters that were presented in the sans serif typeface 
than for those presented in the serif typeface. This was true regardless of the children’s 
age or the size of the typeface used. One problem with this study is that the children 
were tested in small groups, and some cheating had taken place. 

Overhead projectors and slide projectors have in turn been superseded by 
computer-based projection software, most obviously by Microsoft PowerPoint. (This 
was developed in the 1980s by an independent software company to produce 
both overhead transparencies and slides. However, the company was acquired by 
Microsoft in 1987, and it was then developed to display presentations through digital 
projectors on both Windows and Macintosh systems.) PowerPoint has been in general 
use since the early 1990s, and in practice its applications in education and in other 
fields have tended to borrow techniques adopted with the older technology of slide 
projectors. In particular, presenters are still being advised to use sans serif typefaces 
rather than serif typefaces in their PowerPoint presentations (e.g., Garon, 1999). 
More recently, Ing et al. (2017) found that 14 out of 17 speakers at an ophthalmology 
conference had used sans serif typefaces in their presentations. They suggested that 
“serif fonts may be harder to read in digital slides” (p. 172). Phillips’ (1976) sugges- 
tion that such advice is based more upon personal preference than upon empirical 
research probably still applies, but one study has evaluated this directly. 

Earnest (2003) compared the legibility of serif and sans serif typefaces for mate- 
rial presented using PowerPoint. He assigned 138 students to five different groups. 
Four groups viewed a recording of a speech given by their university’s president 
incorporating a slide presentation, whereas the fifth group only viewed the speech. 
Two of the first four groups viewed slides using a serif typeface (Times New Roman), 
and the other two groups viewed slides using a sans serif typeface (Verdana). Imme- 
diately afterwards, the students answered nine multiple-choice questions on factual 
points mentioned in the speech. Nine days later, they were asked to complete the 
test for a second time. Earnest found that the groups who had viewed slides obtained 
higher scores than the group who had only viewed the speech. However, there were 
no significant differences between the groups who had viewed the slides in a serif 
typeface and the groups who had viewed the slides in a sans serif typeface. 

The limited number of studies using these older technologies have failed to 
produce unequivocal evidence favouring either serif or sans serif typefaces. Consis- 
tent with this idea, participants tend to give similar qualitative ratings of serif and 
sans serif typefaces presented using either slide projectors (Kastl & Child, 1968) or 
PowerPoint (Mackiewicz, 2007). There is certainly no support for the notion that 
sans serif styles should be routinely preferred for the projection of textual material. 
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10.3 Issues with Screen Technology 


Early computers typically lacked the facility for generating visual displays; instead, 
they produced text that was generated using rudimentary printers. In the 1960s, 
however, it was realised that visual displays might be useful, and appropriate tech- 
nology was at hand to facilitate this in the form of cathode-ray tubes (CRTs), which 
were widely used in scientific research (as oscilloscopes) and most obviously in tele- 
vision sets. In CRTs, an electron gun stimulates pixels arranged in a checkerboard 
pattern on a phosphorescent screen, and this process is carried out repeatedly in a 
systematic manner to generate a visible display. Early monitors tended to be very 
low resolution, meaning that the details of images were lost. Indeed, this is probably 
the origin of the idea that sans serif typefaces should be used, because serifs would 
have been among the lost details. Schriver (1997, p. 403) noted that designers for 
television had mainly used sans serif typefaces for many years (see also McVey, 
1985). Even so, technology improved, and by the year 2000 high-resolution colour 
CRT monitors had been developed. 

In the 1990s, however, an alternative form of technology became available through 
liquid crystal displays (LCDs). These use liquid crystals to modulate the light emitted 
from a background to generate images on a computer screen, again using a checker- 
board pattern of pixels that is repeatedly scanned. LCD monitors were initially devel- 
oped for use with laptop computers because of their reduced size, weight, and power 
consumption; however, during the 2000s they became available more generally, and 
their resolution typically exceeded that of CRT monitors. As a result, since the late 
2000s LCD monitors have generally superseded CRT monitors in computer-based 
applications. 

One issue with the presentation of text using both CRTs and LCDs is that of 
aliasing. In general terms, this is the under-sampling of the information needed to 
produce an accurate reproduction of a particular character. More specifically, in 
both CRT and LCD technology, text is displayed in the form of arrays of square 
pixels. If a stroke in a character is oblique rather than horizontal or vertical, its 
contour will receive only an approximate representation as an array of pixels. As a 
result, it will appear irregular and ragged rather than continuous (an effect sometimes 
known as “staircasing” or “the jaggies’’). In theory, such “‘aliased” text should be less 
legible than text printed on paper, and the effect should be more pronounced with 
low-resolution monitors than with high-resolution monitors. 

This issue was originally handled by means of anti-aliasing software. This 
smoothed the edges of characters by averaging the surrounding pixels to yield varying 
levels of grey scale in the contours of the displayed text. Gould et al. (1987) found 
that text presented on paper was read significantly faster than aliased text presented 
in the same typeface on a CRT; however, the difference became nonsignificant when 
text presented on paper was compared with anti-aliased text presented in the same 
typeface. An analogous process known as “‘scale-to-grey” was used when displaying 
images scanned from printed text on computer monitors. Sheedy and McCarthy 
(1994) found that scale-to-grey led to enhanced reading performance compared to 
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simple black-and-white text, and participants reported fewer symptoms of eye strain 
as a result of reading scale-to-grey text; as expected, both differences were greater 
with low-resolution CRTs than with high-resolution CRTs. 

The increasing use of LCDs in the 1990s enabled a different approach to be taken 
to the aliasing issue. In LCDs, each pixel consists of three vertical bars representing 
the colours red, green, and blue. In 1998, Microsoft introduced ClearType software 
which used sub-pixel rendering with the aim of enhancing the legibility of text 
presented in LCDs. However, later research using a variety of tasks failed to show an 
unequivocal advantage of ClearType text over aliased text (Aten et al., 2002; Dillon 
et al., 2004, 2006; Gugerty et al., 2004; Slattery & Rayner, 2010; Tyrrell et al., 
2001). Gugerty et al. (2004) also compared ClearType text with anti-aliased text 
using 10-point Verdana. They found that words presented in anti-aliased text were 
read significantly more slowly and less accurately than words presented in ClearType 
or aliased text, and they argued that anti-aliasing software should not be employed 
with smaller type sizes. 

One problem with ClearType software was that the resulting characters had 
coloured borders that were generally thought to be distracting for readers. Microsoft 
therefore offered ClearType with five levels of sub-pixel rendering that varied from 
greyscale with no colour filtering to a high level of colour contrast. Sheedy et al. 
(2008) evaluated both objective performance and subjective preference for material 
presented in a sans serif typeface (Verdana) with all five levels of sub-pixel rendering. 
They found that readers preferred a moderate level of ClearType rendering to higher 
levels or to greyscale, but that ClearType rendering did not improve text legibility, 
reading speed, or reading comfort. 

Another issue with screen technology is the rate at which images are refreshed: 
with CRTs, too slow a refresh rate leads to flicker. Wilkins (1986) showed that 
flicker tended to disrupt readers’ saccadic eye movements, even when the refresh rate 
was sufficiently high to render the flicker imperceptible. (Wilkins showed that eye 
movements were also disrupted when readers read a printed page that was illuminated 
by conventional fluorescent lighting.) With LCDs, the screen itself does not flicker, 
but in many models the screen is backlit using pulse width modulation. The flicker 
may not be perceptible, but readers’ eye movements may be disrupted, and they 
may complain of discomfort (Brown et al., 2020; Wilkins, 2021). Thus, any findings 
regarding participants’ eye movements obtained when reading from either CRTs or 
LCDs need to be interpreted with caution. 


10.4 Conclusions 


This chapter has discussed the increased use of screen-based reading in education and 
in daily life generally. Readers usually have the option of printing off screen-based 
material to be read on paper, and this seems to be popular when researchers have 
to read more serious material. Both the use of computer technology and attitudes 
to such technology seem to vary with the user’s age, but there is no support for the 
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so-called “digital natives” hypothesis. Some existing typefaces were taken over to 
use in computer systems, while other typefaces were developed specifically for on- 
screen use. The chapter discussed the legibility of serif and sans serif typefaces when 
projected by means of older technology such as slide projectors, overhead projectors, 
and PowerPoint, but this did not show any consistent difference in their legibility. 
Finally, the chapter described some of the technical issues concerning the way that 
images are displayed using CRTs and LCDs. 
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Chapter 11 A) 
The Legibility of Letters and Words ciecie; 


11.1 Reading Letters and Words in Serif and Sans Serif 
Typefaces 


One of the earliest uses of computers in vision research was as tachistoscopes for the 
brief presentation of visual stimuli. One fundamental problem was that cathode-ray 
tubes (CRTs) suffered from the decay time or persistence of the phosphor in the 
individual pixels, which led to inaccurate measurements (see Hutner et al., 1999). (It 
should be noted that traditional tachistoscopes were by no means immune to such 
problems: Mollon & Polden, 1978.) Liquid crystal displays (LCDs) are generally 
regarded as more appropriate for vision research (Wang & Nikolić, 2011). Even so, 
it is generally regarded as being good practice to use a backward-masking procedure 
in which a second stimulus or mask (perhaps only a random pattern) is presented 
after a brief interval to overwrite the visual trace of the original stimulus. 

Suen and Komoda (1986) used this approach to compare the recognition of indi- 
vidual uppercase and lowercase letters that had been digitised using an optical scanner 
from print in a slab serif typeface (Courier), a sans serif typeface (Letter Gothic), 
and the output of a dot-matrix printer. All three typefaces were monospaced or non- 
proportional (that is, each character occupied the same width). The characters were 
presented on a high-resolution CRT screen controlled by an Apple microcomputer. 
(The actual duration of the presentation was not specified.) They were followed by 
a mask after 0 ms, 16.7 ms, or 33.3 ms. If the mask was presented immediately after 
the letters, performance was best with the sans serif typeface and worst with the dot- 
matrix style. The differences among the three styles were much reduced if the mask 
was presented after a delay, but this may have been a ceiling effect, since performance 
was 80% or better with all three styles, presumably because of the additional time 
available for processing the letters. 

As mentioned in Sect. 4.1, Korean is another language where the alphabet can 
be rendered in either serif (or Ming) typefaces or sans serif (or Gothic) typefaces. 
Hwang et al. (1997) presented Korean participants with CRT screens consisting of 
12 windows, each containing letters in one of six different sizes. The participants’ 
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task was to find and read aloud a particular target letter and then to report how 
much visual fatigue they had experienced while carrying out the task. The letters 
were presented using either an unspecified Ming typeface or an unspecified Gothic 
typeface. The participants’ response times were faster, their accuracy was higher, and 
their reported visual fatigue was lower when reading letters in a sans serif typeface 
than when reading letters in a serif typeface. Even so, different results were obtained 
by Kong et al. (2011) in a study that was described in Sect. 4.1. They asked Korean 
participants to read aloud sets of four one- or two-syllable letters of varying sizes and 
to rate how much discomfort they had experienced when reading each set. The letters 
were presented either on paper or on an LCD screen using either an unspecified Ming 
typeface or an unspecified Gothic typeface. When the letters were presented on an 
LCD screen, there was no difference between the serif typeface and the sans serif 
typeface either in the participants’ performance or in their reported discomfort. 

As was mentioned in Sect. 5.1, Arditi (2004) devised software to generate type- 
faces with slab serifs of varying size. Arditi and Cho (2005) used this software to 
construct lowercase typefaces of uniform thickness with slab serifs extending 0% 
(sans serif), 5%, or 10% of the cap height (the height of capital or uppercase letters). 
In theory, the resulting typefaces should have varied only in the size of the serifs, but, 
as was mentioned in Sect. 10.1, Arditi and Cho found that an increase in the spacing 
between successive letters had been required to accommodate the serifs. They there- 
fore used an inter-letter spacing of 0%, 10%, or 40% of the cap height, yielding 
a3 x 3 design. Arditi and Cho measured size thresholds when random five-letter 
strings were presented on a CRT computer screen as black letters against a white 
background. Data were obtained from four participants with normal vision. There 
was a large effect of spacing such that closely spaced letters yielded higher thresholds 
(i.e., poorer performance). There was also a significant but small effect of serif size, 
such that serifs of 5% or 10% led to lower thresholds (i.e., better performance) than 
a sans serif typeface, which Arditi and Cho ascribed to the concomitant increase in 
spacing required to accommodate them. 

After Microsoft introduced ClearType software with the aim of tackling the 
aliasing issue in text presented on LCDs (see Sect. 10.3), a range of new typefaces 
was commissioned to try to exploit this new technology. They included two serif 
typefaces (Cambria and Constantia) and four sans serif typefaces (Calibri, Candara, 
Corbel, and Consolas, the last for use mainly in programming). Chaparro et al. 
(2006a, b, 2010) evaluated the legibility of these new typefaces in comparison with 
that of the serif typeface Times New Roman and the sans serif typeface Verdana. Nine 
participants were presented with individual characters from all eight typefaces for 
just 34 ms each (but with no backward mask) using an LCD monitor with ClearType 
software enabled and were asked to say each character’s name aloud. The proportion 
of characters reported correctly was highest for Consolas, Cambria, and Verdana and 
lowest for Times New Roman, Candara, and Corbel. Taking Times New Roman as 
the reference, accuracy was significantly better for Consolas, Cambria, and Verdana 
but not for the other four typefaces (Chaparro et al., 2010). Despite these somewhat 
ambivalent findings, Microsoft made Calibri the default typeface for all its Office 
applications in 2007 (and it remains the default at the time of writing). 


11.1 Reading Letters and Words in Serif and Sans Serif Typefaces 93 


Moret-Tatay and Perea (2011) used a lexical decision task in which Spanish 
students had to say whether or not stimuli were genuine words. Half the stimuli 
were Spanish words, and the other half were nonwords created by changing two 
letters in genuine Spanish words. Each was presented in the centre of an LCD screen 
in either a serif typeface (Lucida Bright) or a sans serif typeface (Lucida Sans). Once 
again, the typefaces should have differed only in the presence or absence of serifs, 
but, as mentioned in Sect. 10.1, Moret-Tatay and Perea noted that the serifs had 
occupied some of the space between the letters, and so removing the serifs led to 
a slight increase in the inter-letter spacing. They found that participants responded 
significantly more quickly to words presented in a sans serif typeface than to words 
presented in a serif typeface, and they suggested that this might have been due to the 
slight increase in inter-letter spacing. 


11.2 The “‘Stripiness” of Words Displayed on Screens 


Section 4.2 described research by Wilkins et al. (2007) which measured the vertical 
“stripiness” of a word by the height of the first peak of the autocorrelation between 
an image of the word and a second, horizontally displaced image of the same word. 
They had found that words with a higher first peak (i.e., more stripy words) were read 
more slowly than words with a lower first peak (i.e., less stripy words). However, 
it was not clear whether this led to variations in how quickly words in different 
typefaces were read. 

Liversedge et al. (2006) had displayed sentences to 15 students as white letters on 
a black background using a CRT screen and monitored the movements of both their 
eyes. They found that in normal binocular reading the two eyes were often misaligned 
after a saccade, so that part of the duration of the subsequent fixation was taken up 
correcting this disparity in order to achieve binocular vergence. Jainta et al. (2010) 
presented 32 German participants with 120 unrelated sentences in blocks of 10 to 
read silently from a CRT screen. They found that the participants achieved better 
binocular vergence when the sentences contained words with a higher first peak, but 
that this took longer to achieve and led to a longer overall fixation duration. They 
argued that these findings explained the longer overall reading time for words with 
higher first peaks. 

Wilkins et al. (2020) observed that different typefaces appeared to vary in the 
periodicity of their letters’ vertical strokes. In two serif typefaces, Times and Palatino, 
the letter strokes were relatively evenly spaced, whereas in two sans serif typefaces, 
Arial and Verdana, the spaces between the strokes within a letter were greater than 
the spaces between the letters, leading to low periodicity. Wilkins et al. determined 
the first peak of the horizontal autocorrelation for passages from two novels when 
they were printed to the Retina (LCD) screen of an Apple Macbook Pro in each of 
nine serif typefaces and in each of 11 sans serif typefaces. They found that the first 
peak of the horizontal autocorrelation was significantly greater for the serif typefaces 
than for the sans serif typefaces. Wilkins et al. argued that this difference in the first 
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peak of the horizontal autocorrelation was not due to the serifs themselves but to the 
effect of the serifs on the rhythm or periodicity of the typefaces. However, Wilkins 
et al. did not provide any evidence that these differences led to significant variations 
in how quickly words in different typefaces were read. 


11.3 Confusions Among Letters in Serif and Sans Serif 
Typefaces 


In their original study involving the presentation of uppercase and lowercase letters 
in either a slab serif typeface or a sans serif typeface (mentioned in Sect. 11.1), 
Suen and Komoda (1986) observed that with both typefaces errors often consisted of 
confusions between uppercase and lowercase forms of the same letter or confusions 
between letters that were visually similar in the same case. The total number of 
confusions was similar in the two typefaces, but there were certain differences in the 
pattern of confusions. For instance, with the sans serif typeface, lowercase letters 
tended to be mistaken as their uppercase counterparts rather than vice versa, but 
the reverse tended to be true for the slab serif typeface. Suen and Komoda ascribed 
these trends to the design of the characters in the relevant typefaces rather than to 
the presence or absence of serifs. 

Using a laptop computer with an LCD screen, D. Fox, Chaparro, and Merkle 
(2007) presented individual characters (the 26 lowercase letters plus the digits 0-9 
and 1 1 common mathematical and scientific symbols) to ten participants in ClearType 
rendering for just 34 ms (but with no backward mask) in each of 20 different typefaces. 
The participants were asked to say each character aloud and were scored on their 
accuracy. Fox et al. focused on errors for the letter e (which is confusable with the 
letter c and the number 0) and for the number 0 (which is confusable with the letters 
e and o). For the letter e, the most accurate performance was obtained with the sans 
serif typefaces Clearview Text and Verdana, and the least accurate performance was 
obtained with the serif typeface Garamond. For the number 0, the most accurate 
performance was obtained with the serif typefaces Centaur and Rockwell, and the 
least accurate performance was obtained with the serif typeface Constantia. 

In further results from this study, Fox et al. (2008) compared the data for the 
numbers 0 and 7 (which is confusable with the letter /). For the number /, the most 
accurate performance was obtained with the sans serif typeface ClearView Text, and 
the least accurate performance was obtained with the serif typeface Centaur. Fox et al. 
employed classification tree analysis to identify the physical features of different 
typefaces that might be responsible for variations in legibility for both numbers, 
although they did not include the presence or absence of serifs as a feature in these 
analyses. Taken together, the findings of this study suggest that some characters 
tend to be more legible when presented on-screen in sans serif typefaces than when 
presented on-screen in serif typefaces, whereas the opposite is true for some other 
characters. 
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Beier and Larson (2010) constructed three new artificial typefaces. In each case, 
they varied the presence or absence of slab serifs with no other changes to the letters 
themselves. In a pre-test, each of 34 participants viewed the letter d presented on the 
LCD screen of a laptop computer that had been placed on a podium at eye-level height 
at a distance of 10 m. They approached the podium until they could correctly identify 
the letter. The relevant distance was then used for the presentation of individual 
uppercase and lowercase letters in the main experiment. Beier and Larson found 
that serifs could serve either to enhance or to impair the relative differentiation of 
individual letters. For instance, a slab serif added to the top of the stem of the letter i 
led to improved identification, but this was not the case when slab serifs were added to 
both the top of the stem and the baseline. In Sect. 4.2, it was noted that similar findings 
had been obtained with the identification of individual characters when reading from 
print (Harris, 1973; Tinker, 1963, p. 36). However, Vernon’s (1929) findings, again 
based on reading from print, imply that such confusions would be much less likely 
if letters were presented in the context of meaningful text, as in normal reading. 


11.4 Conclusions 


As with reading from print, the earliest research on the legibility of different typefaces 
when reading from screens was concerned with recognising individual letters and 
words under different conditions. Studies that employed authentic typefaces showed 
at most that some sans serif typefaces are more legible than some serif typefaces. 
Research using artificial typefaces suffers from confounding between (a) the presence 
or absence of serifs and (b) variations in the width of the letters and the spacing 
among successive letters. The horizontal autocorrelation of individual words differs 
across different typefaces, but it is not clear whether this leads to differences in how 
quickly different typefaces can be read. The vertical stripiness of serif typefaces 
tends to be greater than that of sans serif typefaces, but there is little evidence that 
this leads to variations in how quickly words in different typefaces are read. As 
with printed letters and words, identification errors are often the result of confusions 
among visually similar letters, but visual confusions are not more likely with sans 
serif typefaces than with serif typefaces, contradicting an old hypothesis that serifs 
make letters easier to discriminate (Legros, 1922, p. 11). 
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Chapter 12 A) 
Reading and Comprehending Text get 


12.1 Reading Text in Serif and Sans Serif Typefaces 


The work by Gould et al. (1987) was mentioned in Sect. 10.3. In another exper- 
iment, Gould et al. followed a number of previous studies comparing screen and 
paper presentation in a proof-reading task: 18 research workers read through arti- 
cles of about 1,000 words to try to locate between 1 and 10 misspelled words by 
saying them aloud to the experimenter. The screen versions were presented as anti- 
aliased characters on a cathode-ray tube (CRT) screen using a system which had been 
specifically designed to simulate their appearance on paper, and high-quality printed 
versions were generated using the same script files. The screen and paper versions 
of each article were both presented using a serif typeface (Press) and two sans serif 
typefaces (Letter Gothic and Univers). Each participant read one article on-screen 
and one article on paper in each of the three typefaces. Overall, the paper versions 
were read significantly faster than the screen versions, although this might have been 
partly because the screen system required 1.5 s scrolling time to go from one page 
of text to the next. There was no significant difference in the participants’ accuracy 
between the two versions. There was no significant variation across the typefaces 
and no significant interaction between the effects of display mode and typeface in 
either speed or accuracy. 

Also using a CRT screen, Tullis et al. (1995) compared the legibility of four 
typefaces available in the Microsoft (MS) Windows operating environment: two serif 
typefaces, MS Serif and Small Font, and two sans serif typefaces, Arial and MS Sans 
Serif. Each was used in two, three, or four different sizes, yielding 12 combinations 
of typeface and size. Each of these was shown in either a bold or regular style 
and against a white or grey background, resulting in a total of 48 conditions that 
were administered in a random sequence. All of the participants viewed the same 
paragraph of text presented in each condition and were asked to count the number 
of typographical errors that it contained. After they had read the paragraph, they 
pressed the “Enter” key and reported the number of errors using a dialogue box. 
Tullis et al. measured the time taken to read the paragraph and whether or not the 
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correct number of errors had been reported. The two sans serif typefaces yielded 
faster reading times, higher accuracy, and higher ratings than the two serif typefaces, 
especially with larger type sizes. 

Garcia and Caldera (1996) asked ten students to read aloud paragraphs of 30 
words from a computer screen (neither the computer nor the monitor was specified). 
The paragraphs were presented in three different type sizes and in five different 
typefaces: a serif typeface (Times New Roman), three sans serif typefaces (Arial, 
System, and MS Sans Serif), and a cursive typeface (i.e., a typeface intended to 
mimic handwriting) (Lucida Casual). There was significant variation across the 15 
conditions, with 10-point Arial yielding the fastest reading time. System yielded the 
second fastest time, but no further results were reported. 

Stone et al. (1999) asked 48 female survey interviewers to read aloud 24 sets of 
material, each consisting of 30 random words chosen to be at the eighth-grade reading 
level. Half of the sets were presented in black ink on white paper, whereas the other 
half were presented in black type against a white background on the liquid crystal 
display (LCD) screen of a laptop computer to simulate the former. In both cases, 
each set of words was presented on a single page, and the order of the conditions was 
counterbalanced. The sets of words were presented in three different typefaces: a sans 
serif typeface (Helvetica), a serif typeface (Times Roman), and a slab serif typeface 
(Courier). Stone et al. found that their participants read the sets in serif typefaces 
faster than the sets in sans serif typeface, but there was no significant variation in the 
number of errors made. There was no significant difference in either reading speed 
or accuracy between the sets presented on paper and those presented on-screen, and 
neither of the interactions with the effects of typeface and mode of presentation was 
significant. 

Josephson (2008) carried out an exploratory study of eye-movements in reading 
from screens. She presented six participants with four news stories of around 250 
words on a high-resolution monitor, and their eye-movements were tracked using a 
corneal reflection system. The stories were presented in the four different typefaces 
shown in Fig. 10.1 in Sect. 10.1. Afterwards, the participants were asked to rate 
each typeface on several 10-point scales and then to say which was easiest to read 
and which they liked the most. The story presented in Verdana was read the fastest, 
followed by that presented in Times New Roman. The story presented in Times 
New Roman yielded the fewest fixations, whereas that presented in Verdana yielded 
the most. Conversely, the story presented in Verdana yielded the fewest backward 
movements to reread previously presented words, whereas that presented in Times 
New Roman yielded the most. Verdana was rated highest whereas Times New Roman 
was rated lowest of the four typefaces. When asked which typeface was the easiest to 
read, three of the participants chose Verdana, whereas none chose Times New Roman. 
Apart from the small number of participants, one limitation of this study was that one 
story was assigned to each typeface, and so differences among the typefaces were 
directly confounded with differences among the news stories themselves. 

Banerjee and Bhattacharyya (2011) presented 40 young postgraduate researchers 
with 18 passages of roughly the same length on an LCD monitor. The passages 
were presented in one of three serif typefaces (Times New Roman, Georgia, and 
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the slab serif Courier New) or one of three sans serif typefaces (Arial, Tahoma, and 
Verdana) in one of three sizes (10, 12, or 14 points). The passages were assigned at 
random to the 18 conditions for each participant and presented in a different random 
order to each participant. The time taken by the participants to read each passage 
was recorded, they then rated the overall ease or difficulty of reading each passage, 
and finally they completed a short questionnaire on the mental workload involved in 
reading each passage. Their eye movements were also tracked using a binocular eye 
movement recorder. 

There was a significant interaction between the effects of typeface and type size on 
the participants’ reading time. The effect of type size was significant for Courier New 
and Arial, but not for the other four typefaces. The mean reading time was signifi- 
cantly less for the serif typefaces than the sans serif typefaces, but only for 10-point 
and 14-point sizes. Courier New and Arial also yielded the fastest average reading 
time. Verdana in 14-point was rated as easier to read than any other combinations 
of typeface and size, and Arial in 14-point was rated as the second most preferred. 
Verdana in 14-point was also rated as having the least mental workload, followed 
by Arial in 14-point and Tahoma in 14-point. Courier New yielded the lowest fixa- 
tion duration, followed by Verdana and Arial; Courier New also yielded the lowest 
total gaze duration, followed by Verdana. In contrast, Times New Roman yielded 
the longest fixation duration and the longest total gaze duration. However, there 
were no overall differences between serif and sans serif typefaces in eye movement 
parameters. 

Perea (2013) presented 24 Spanish undergraduate students with individual 
sentences on a computer screen and measured their eye movements using a video- 
based eye-tracking device. Each sentence appeared when the participant looked at a 
square on the left-hand side of the screen, and the participant pressed a key when they 
had finished reading the sentence to themselves. To check on their comprehension, 
a yes/no question was presented after 20% of the sentences. (Overall, 96% of these 
questions were answered correctly.) The sentences were presented in four blocks of 
30. Half were presented in the serif typeface Lucida, whereas half were presented 
in the sans serif typeface Lucida Sans. There was no significant difference in the 
participants’ reading times, in the total number of fixations, or in the mean duration 
of their fixations between the serif and sans serif sentences. Perea concluded that 
the presence or absence of serifs did not materially affect the process of normal 
reading, although different results might have been obtained with longer extracts, as 
in Josephson’s study. 


12.2 Comprehending Text in Serif and Sans Serif Typefaces 


As with research on reading from paper (see Chap. 5), asking the participants to read 
continuous text provides less opportunity for researchers to impose experimental 
control over their reading behaviour. Some researchers have therefore focused on 
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their participants’ comprehension of meaningful material read from screens rather 
than upon its legibility per se. 

Williams (1990) carried out an experiment in which 56 students read two passages 
on a CRT monitor and answered four comprehension questions after each passage. 
Each passage contained 650-700 words and filled seven screens, so that the students 
had to scroll down to read the complete passages. The material was presented in either 
a serif typeface (10-point Times Roman) or a sans serif typeface (9-point Helvetica); 
the use of different type sizes ensured that the typefaces were of similar x-height. 
The participants were randomly assigned to five different groups. Four groups saw 
the two passages in different typefaces, with the order of the two passages and the 
order of the two typefaces counterbalanced across the four groups. The fifth group 
saw both passages in the sans serif typeface to check for any difference in difficulty 
between the two passages. The monitor screen also contained a clock, and the students 
were asked to record the times when they started and finished reading each passage. 
The comprehension questions were multiple-choice with five alternatives. The fifth 
(control) group showed no difference in mean reading rate between the two passages. 
Across the other four groups, the mean reading rate was 14.75 words/min faster for the 
sans serif typeface than for the serif typeface, but the difference was not statistically 
significant. 

Lenze (1991) used a personal computer to present 84 students with a brief para- 
graph and then asked them a question based on its content to which they would 
have to type a one-word answer. The paragraph was initially presented for 1 s; if 
the participant answered incorrectly, the paragraph was presented for 1 s longer, and 
this process was continued until they answered the question correctly. They were 
then presented with three further paragraphs in the same way. Half the participants 
chosen at random were shown the paragraph in a serif typeface, and the other half 
were shown the paragraph in a sans serif typeface. (Neither of the typefaces was spec- 
ified.) Finally, all the participants were shown examples of text in both serif and sans 
serif typefaces and were asked which they preferred. Lenze found that there was no 
sign of any significant difference between the two groups of participants in the time 
needed to achieve comprehension of the texts, which suggested that serif typefaces 
and sans serif typefaces were equally effective in supporting reading comprehension. 
Even so, 77% of the students preferred the sans serif typeface to the serif typeface. 

Boyarski et al. (1998) compared the serif typeface Georgia and the sans serif 
typeface Verdana, both of which had been designed for on-screen display. Sixteen 
university staff and graduate students read two passages from a standard reading test 
presented in the two typefaces in the same body size in Microsoft Word without anti- 
aliasing. The orders of the typefaces and the passages were both counterbalanced. 
They were allowed 1 min to read each passage and were then asked four questions 
to test their comprehension of the passage (each answer being scored between 0 and 
3). Finally, the participants were asked to compare the two typefaces. A measure of 
“effective reading speed” was obtained by dividing their comprehension score by 
their actual reading time on each passage in order to allow for the possibility of a 
trade-off between speed and accuracy. Their comprehension scores were significantly 
higher for passages in Georgia than for passages in Verdana. Nevertheless, there was 
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no significant difference between the two conditions either in their actual reading 
time or in their effective reading speed. The participants judged the passage presented 
in Verdana to be easier to read than the passage presented in Georgia, but there was 
no significant difference in their ratings of the passages’ sharpness or legibility. 

Hojjati and Muniandy (2014) presented 30 international students at a Malaysian 
university with four 200-word English texts in different typefaces. Two were 
presented in the serif typeface Times New Roman, and two were presented in the sans 
serif typeface Verdana; in each case, one text was presented single-spaced, and the 
other was presented double-spaced. After each text, the students were presented with 
questions designed to test their retention of its content. Finally, they were asked to rate 
the ease with which they had been able to read each text on a 6-point scale. Regardless 
of spacing, they found text displayed in Verdana easier to read than text displayed 
in Times New Roman, they read text displayed in Verdana more quickly, and they 
recalled more of the content of text displayed in Verdana. However, the researchers 
had used Amazon Kindles (which use microcapsules containing electronic “ink”) 
rather than LCDs, and at least some of the texts did not fit completely onto the visible 
area of the screens (p. 167). The material had been taken from Wikipedia and other 
sources and had been checked by subject experts, but the texts themselves suffered 
from grammatical and stylistic problems. It is also not clear whether the typefaces 
used for different texts were counterbalanced across participants or whether each 
text was always presented in the same typeface. 

Csilla et al. (2016) asked 74 Hungarian students to read to themselves four self- 
contained excerpts from a Hungarian translation of The Nature of Space and Time 
by Hawking and Penrose (1999). The excerpts ran for four pages in the original 
book but had been reformatted to fit onto two pages of European A4-sized paper. 
Each excerpt was prepared in six different typefaces and then saved in Portable 
Document Format. Two excerpts were presented on paper, and two were presented 
on an LCD computer monitor. In each case, one excerpt was presented in a serif 
typeface randomly chosen from Book Antiqua, Garamond, and Times New Roman, 
and the other was presented in a sans serif typeface randomly chosen from Arial, 
Calibri, and Verdana, so that each student saw four different typefaces. The order 
of the four texts was randomised for each student. The students timed themselves 
reading each excerpt and then answered a number of questions about its content. 

Csilla et al. analysed the data for each of the four excerpts separately and carried 
out independent-groups tests comparing (a) the participants who had read the excerpt 
on paper with those who had read the excerpt on-screen and (b) the participants who 
had read the excerpt in a serif typeface with those who had read the excerpt in a 
sans serif typeface. There were no significant differences in either reading time or 
comprehension for any of the four excerpts. It is unfortunate that Csilla et al. did not 
increase the statistical power of their analysis by using the data from all four excerpts 
and carrying out repeated-measures tests on the variables of presentation medium 
and typeface within the 74 participants. They also did not examine whether there 
was any interaction between the effects of these two variables on either reading time 
or comprehension. 
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12.3 Rapid Serial Visual Presentation 


The introduction of new technologies that enable images to be presented in a variety 
of visual forms prompted new research methods to be used for investigating reading. 
One such method is that of rapid serial visual presentation (RSVP), in which letters, 
words, or groups of words are presented one at a time at the reader’s point of fixation. 
This method was first developed by Gilbert (1959a, b), who presented sentences 
containing eight words using a movie projector to each of 76 participants and asked 
them to write down immediately afterwards what they had seen. He found that they 
were much more accurate when successive pairs of words were shown at the point of 
fixation than when successive pairs of words were presented across the screen as if in 
a line of text. Gilbert concluded that the former procedure enhanced the identification 
of stimuli by eliminating the need for eye movements. 

Rubin and Turano (1992) digitised characters from the output of a laser printer 
using a Times Roman typeface and then used them in text displayed one word at a 
time on a high-resolution screen controlled by an IBM computer. They found that 
readers who were previously unfamiliar with this procedure could achieve reading 
speeds that were several times faster than when reading the same material presented 
on-screen as a single paragraph, typically in excess of 1,000 words/min for reading 
aloud or as fast as 1,800 words/min for silent reading. 

As was mentioned in Sect. 11.1, Suen and Komoda (1986) had digitised printed 
characters in a slab serif typeface (Courier), a sans serif typeface (Letter Gothic), and 
the output of a dot-matrix printer. They used these characters to present paragraphs of 
roughly 160 words drawn from magazine articles to 36 participants. Each paragraph 
was shown on a high-resolution CRT screen, one word at a time for just 100 ms. 
Multiple-choice questions (how many was not specified) were then employed to 
assess the participants’ comprehension of the paragraph. The dot-matrix style yielded 
the worst performance, but there was no difference between the comprehension of 
material presented in the slab serif typeface and that presented in the sans serif 
typeface. Suen and Komoda argued that the higher-order skills and strategies involved 
in reading connected text had overridden any differences in legibility between the 
serif and sans serif typefaces. 

Yager et al. (1998) used the RSVP method to compare the legibility of two type- 
faces matched for x-height: Dutch, a serif typeface similar to Times; and Swiss, a sans 
serif typeface similar to Helvetica. Sentences were presented on a CRT monitor in 
white letters on a black background under conditions of either high luminance or low 
luminance. The latter was intended to stress the visual system and hence to simulate 
the situation of readers with visual impairment. The participants were 46 normally 
sighted college and high-school students who were asked to read each sentence aloud 
immediately after it had been presented. The presentation rate was calibrated to iden- 
tify the threshold for correct responding for each participant. Under high luminance, 
there was no significant difference between the number of sentences read correctly 
in the two typefaces. Under low luminance, performance was significantly better 
with the Swiss typeface. Yager et al. noted that in this situation the Dutch typeface 
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was close to the threshold of visual acuity. They speculated that under conditions 
of low luminance either the serifs or the thinner strokes of the letters of the Dutch 
typeface tended to be invisible, and that this was responsible for the significantly 
poorer reading performance. 

Also using RSVP, Morris et al. (2002) presented 27 participants with words that 
made up meaningful but unconnected sentences. The words were presented on a CRT 
monitor in slab serif and sans serif typefaces taken from Bigelow and Holmes’ (1986) 
Lucida styles to ensure that the typefaces were matched in all respects except for 
the presence or absence of serifs. When the words were presented at the equivalent 
of 14-point type (a normal reading size), there was no difference in the number 
that were read correctly in the slab serif and sans serif typefaces. However, when 
they were presented at the equivalent of 4-point type (a small but still tolerable 
size), performance was better with the sans serif typeface than with the slab serif 
typeface. Morris et al. suggested that the letters appeared relatively crowded in the 
latter situation, and that rendering serifs in small sizes might be counterproductive. 

As mentioned in Sect. 5.1, Arditi and Cho (2005) devised lowercase typefaces 
of uniform thickness with slab serifs that extended for 0% (sans serif), 5%, or 10% 
of their cap height. They presented sentences in these typefaces on a CRT display 
screen using RSVP and adjusted the presentation speed for each participant to ensure 
a 50% correct reading rate. Data were obtained from two participants with normal 
vision and two with impaired vision. The former participants achieved higher reading 
speeds than the latter participants, but there was no effect of serif size and hence no 
effect of the presence versus the absence of serifs. Arditi and Cho acknowledged that 
the small number of participants was a limitation of their study. 

One motivation for investigating the RSVP procedure was that it was suspected it 
might help to compensate for the limitations of handheld devices with small screens, 
such as cellular phones, palmtop computers, and personal digital assistants (Bernard, 
Chaparro, & Russell, 2000; De Bruijn et al., 2002). Palmtop computers and personal 
digital assistants were fairly popular towards the end of the twentieth century, but 
during the 2000s their functionality was superseded by that of smartphones. It is thus 
not surprising that the RSVP procedure also became less popular as a research tool. 
In any case, a major criticism of the procedure is that it lacks ecological validity, 
insofar as it does not represent a situation that is characteristic of normal reading in 
real-life settings (Perea, 2013). 


12.4 Reading Material on Handheld Devices 
and Smartphones 


It has tended to be assumed that sans serif typefaces are more legible than serif 
typefaces when used on handheld devices or smartphones. The sans serif typefaces 
Droid and Roboto were developed for Android mobile phones, while the sans serif 
typeface San Francisco was devised for Apple products, although all three typefaces 
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are available in serif styles. The limited amount of research comparing the legibility of 
serif and sans serif typefaces when employed on handheld devices and smartphones 
has been carried out by Korean researchers. 

Park et al. (2008) presented texts to four Korean students by means of a personal 
digital assistant (a Pocket PC 2002). They were asked to read the texts silently, 
and their eye movements were monitored using an eyeball-tracking camera which 
reflected infrared rays on their corneas. The texts were shown in three type sizes 
and in three typefaces: two serif typefaces (Batang and Gungseo) and one sans serif 
typeface (Gulim). The participants’ fatigue was measured both by monitoring their 
blink rate and by asking them to rate how easy or difficult it had been to read each 
text on a 7-point scale. There was no significant variation across the three typefaces 
in their reading speed, their error rate, their eye-blinking, or their subjective ratings. 

Kim et al. (2015) presented 14 Korean students with pairs of two-syllable words 
using an Apple iPhone with a display of 90 mm by 50 mm using one of two serif 
typefaces (Batang or Gungseo) or one of two sans serif typefaces (Dodum or Gulim). 
There were 10 trials for each typeface, and the order of the typefaces was counter- 
balanced across the participants. On each trial, one member of the word pair had 
been designated as the target, whereas the other was a distractor, and the partici- 
pants’ task was to read aloud the target in each pair. Their mean reading time was 
marginally faster for the words shown in serif typefaces than for words shown in sans 
serif typefaces, but the difference between the two means was not at all statistically 
significant. 


12.5 Connotations of Serif and Sans Serif Typefaces 


Several of the studies mentioned asked participants to express a preference between 
serif and sans serif typefaces. Misanchuk (1989) argued that, in the absence of 
evidence for objective differences in legibility between typefaces, designers might be 
guided by readers’ preferences or satisfaction ratings. In fact, several studies found 
a significant preference for sans serif typefaces over serif typefaces (Boyarski et al., 
1998; Hojjati & Muniandy, 2014; Lenze, 1991; Tullis et al., 1995), some found no 
significant difference (Garcia & Caldera, 1996; Holleran, 1992; Muter & Mauretto, 
1991), but none found a significant preference for serif typefaces over sans serif 
typefaces. 

Savory et al. (2012) found a clear preference for sans serif typefaces among mili- 
tary experts in the highly specialised area of the design of radiation detector screens. 
They used a focus group to discuss other aspects of these devices, a methodology 
where responses can be vulnerable to peer pressure from the other participants. 
However, the participants expressed their preferences among the different typefaces 
in an individual written survey, which should have avoided this problem. 

Section 5.4 noted research that had focused on the connotations of different type- 
faces when used for print-based text, and analogous research has been carried out on 
the connotations of different typefaces when presented on computer monitors. Shaikh 
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et al. (2006) examined the connotations of 20 different typefaces; they included four 
serif and six sans serif typefaces, among which were the six typefaces introduced 
by Microsoft to make use of ClearType software (see Sect. 10.3). Using an online 
survey, more than 500 participants rated the characteristics of each typeface using 
15 bipolar scales and then indicated whether they would use the typeface for each of 
25 online purposes. 

The serif typefaces were regarded as being stable, practical, mature, and formal, 
but the sans serif typefaces were not regarded as being especially high or low on any 
of the 15 traits. The serif typefaces were regarded as being appropriate for business 
documents, website text, or online magazines but not for digital scrapbooking, chil- 
dren’s documents, or e-greetings. The sans serif typefaces were regarded as being 
appropriate for website text, e-mail, or online magazines but not for scrapbooking, 
computer programming, or mathematical documents. Shaikh et al. concluded that 
computer users consistently attributed personalities to typefaces displayed on-screen 
and that both serif typefaces and sans serif typefaces were seen as being appropriate 
for the kinds of material that were typically read on-screen. 

Shaikh (2007, pp. 44—100) carried out another online survey in which 379 partic- 
ipants rated text samples of 40 typefaces on 15 bipolar constructs. The typefaces 
included ten examples of each of four categories: serif, sans serif, display (used in 
advertisements or logos), and cursive (akin to handwriting). Each participant was 
shown 20 typefaces including five serif typefaces, five sans serif typefaces, five 
display typefaces, and five cursive typefaces; these were randomly selected and 
presented in a random order. The participants also rated the legibility of each type- 
face using a 7-point scale. There were significant differences between the serif and 
the sans serif typefaces on five of the scales: the serif typefaces were regarded as more 
delicate, beautiful, expensive, warm, and old, whereas the sans serif typefaces were 
regarded as more rugged, ugly, cheap, cool, and young. However, these differences 
were small in magnitude, and in general the serif and sans serif typefaces tended to 
be regarded as similar. The participants’ ratings of legibility showed no significant 
difference between the serif and the sans serif typefaces, but both were rated as being 
significantly more legible than the display typefaces or the cursive typefaces. 

Koch (2012) asked 100 volunteers to evaluate six typefaces in an online survey: 
five were variants of a sans serif typeface, Helvetica, and one was a slab serif typeface, 
Glypha Medium. For each typeface, participants were presented with the uppercase 
and lowercase alphabets, together with a grid showing 12 cartoon characters, each 
representing a different emotion (although the emotions themselves were not named). 
When they clicked on each character, they were shown a short animation in which 
the character enacted the relevant emotion; they were also shown a 5-point scale 
and indicated how much the relevant typeface had aroused the emotion in them 
(from “I do not feel this” to “I feel this strongly”). Out of the 100 volunteers, 42 
completed the survey (yielding 12 ratings for each of six typefaces), of whom 32 had 
had some prior training in design. Comparisons between the overall ratings given 
to the slab serif typeface and the sans serif typefaces yielded just one significant 
comparison, in that Glypha Medium yielded higher ratings of satisfaction than the 
average across the different variants of Helvetica. However, Koch had carried out 48 
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different comparisons in total, suggesting that even this effect might well have been 
a spurious result due to chance variation (i.e., a Type I error). 

Kaspar et al. (2015) carried out an online survey in which 188 students evaluated 
three scientific abstracts on six dimensions. The students were randomly assigned to 
receive the abstracts and the rating scales either in the serif typeface Lucida Bright or 
in the sans serif typeface Lucida Sans. (As mentioned in Sect. 11.2, these typefaces 
are identical except for the presence or absence of serifs.) There was no significant 
difference in the time taken to read the abstracts or in the time taken to complete 
the rating scales between the groups who received material in the two typefaces. 
The students who read the abstracts in the serif typeface rated them significantly 
more positively than the students who read them in the sans serif typeface: they rated 
them as significantly more comprehensible and more appealing overall, and they 
were significantly more interested in reading the full paper from which the abstract 
had been taken; they also rated the topicality, the quality, and the importance of 
the research question significantly more highly. Kaspar et al. noted that reports of 
scientific work were more often presented in a serif typeface, and they concluded that 
the use of such a typeface increased the impression of a work’s scientific character. 

Kaspar et al. (2015) carried out a similar study in which 187 students were 
randomly assigned to receive abstracts and rating scales either in the serif type- 
face Times New Roman or in the sans serif typeface Arial. (These typefaces differ 
on several other characteristics as well as in the present or absence of serifs.) Once 
again, there was no significant difference in reading speed between the groups who 
received material in the two typefaces. However, in this case, the participants who 
read the abstracts in the sans serif typeface rated them significantly more positively 
than the students who read them in the serif typeface: they rated them as significantly 
more comprehensible and more appealing overall, and they were significantly more 
interested in reading the full paper from which the abstract had been taken; they 
also rated the quality of the research question significantly more highly, although 
there was no significant difference between the students who read the abstracts in 
different typefaces in their ratings of either the topicality or the importance of the 
research question. Kaspar et al. concluded that there was no simple rule of thumb 
that favoured the presence or absence of serifs under all circumstances and that sans 
serif typefaces could lead to more desirable text evaluations when other features of 
the text compensated for the missing serifs. 


12.6 Conclusions 


Studies of the legibility of connected sentences that have measured readers’ speed and 
accuracy have proved inconclusive. Research on reading text presented on computer 
screens has enabled investigators to use other forms of technology such as eye- 
tracking equipment. However, research into readers’ eye movements has not yielded 
conclusive findings with regard to the presence or absence of serifs. As with research 
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on reading from paper, asking participants to read continuous text provides less oppor- 
tunity for researchers to impose experimental control over their reading behaviour. 
Some researchers have instead focused on their participants’ comprehension of mate- 
rial. Such studies have not yielded consistent findings with regard to the presence or 
absence of serifs on readers’ speed or accuracy, and at least one study suffered from 
serious methodological problems. 

A particular device that has been investigated is the presentation of letters, words, 
or groups of words one at a time at the reader’s point of fixation: RSVP. This was orig- 
inally thought to compensate for the limitations of handheld devices. Five studies 
have compared readers’ comprehension, speed, or accuracy and found no signifi- 
cant differences except that serif styles were less legible with very small type or 
under conditions of low luminance when, of course, serifs are likely to have been 
faint or even completely invisible. It has tended to be assumed that sans serif type- 
faces are more legible than serif typefaces when used on handheld devices or smart- 
phones. However, two studies failed to find any difference between serif and sans 
serif typefaces in terms of the participants’ reading performance when using such 
devices. 

Finally, this chapter described research on the connotations of different type- 
faces when presented on computer screens. Some studies, but not all, have found 
that readers express a preference for sans serif typefaces when reading on computer 
screens, but in general both serif and sans serif typefaces are regarded by users as 
appropriate for online purposes. There is some evidence that serif and sans serif type- 
faces differ in their connotations or “personality”. These differences seem to reflect 
variations in readers’ expectations, which in turn depend on their prior experience 
and familiarity with different typefaces. 
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Chapter 13 A) 
Readers with Disabilities E 


13.1 Readers with Visual Impairment 


Plass and Yager (1995) tested 61 patients with visual impairment due to a variety of 
causes. Almost all preferred to read text shown on a computer monitor rather than 
printed on paper. Their reading rates were significantly faster when the individual 
words were presented on a screen using the RSVP procedure than when they were 
presented as a single paragraph. Although they had presented the text using both 
the serif typeface Times Roman and the sans serif typeface Arial, they found no 
significant differences in reading speed between the two typefaces (see Yager et al., 
1998). Four of their patients were adults with a history of congenital nystagmus; as 
mentioned in Sect. 8.2, this causes visual impairment by disrupting the normal pattern 
of eye movements. A follow-up study by Aquilante et al. (1997) confirmed that these 
four patients tended to read faster with RSVP than when reading the same material 
as a single paragraph of text. This suggested that techniques which eliminated the 
need for eye movements might be useful for improving the reading ability of people 
with nystagmus or other visual disorders. However, Aquilante et al. did not compare 
their patients’ performance using different typefaces. 


13.2 Readers with Dyslexia 


Research on people with dyslexia reading from print was described in Sect. 8.7. It was 
noted that the British Dyslexia Association (2018) had for many years recommended 
that materials for people with dyslexia should use sans serif typefaces, and this advice 
seems intended to apply both to material shown on screens and to material printed 
on paper. 

Rello and Baeza- Yates (2013) recruited 48 participants between the ages of 11 and 
50 who had been clinically certified as dyslexic and asked them to read 12 extracts 
of 60 words from a Spanish novel. These were presented in 12 different typefaces, 
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including three serif typefaces (Computer Modern Unicode, Garamond, and Times) 
and four sans serif typefaces (Arial, Helvetica, Myriad, and Verdana). Assignment 
of the different typefaces to different extracts was counterbalanced across different 
participants. The extracts were presented on a liquid crystal display (LCD) screen 
in 14-point size, and the readers’ eye movements were monitored using a corneal 
reflection system, both to record the duration of their fixations and to record their 
total reading time. After each extract, the participants answered a multiple-choice 
question to check their comprehension of the text, and at the end of the experiment 
they rated their preference for each typeface on a 5-point scale. 

There was significant variation among the 12 typefaces in the participants’ reading 
time, but there was no overall difference between the serif typefaces and the sans serif 
typefaces in this regard. There was significant variation among the 12 typefaces in the 
participants’ fixation duration, and this was slightly longer for the serif typefaces than 
for the sans serif typefaces. There was significant variation among the 12 typefaces 
in the participants’ preference ratings, but there was no overall difference between 
the serif typefaces and the sans serif typefaces in this regard. Rello and Baeza- 
Yates argued that shorter fixations reflected a reduced processing load and hence 
greater legibility, and they concluded that sans serif typefaces enhanced reading 
performance, even though this was not reflected in an enhanced reading time. 

However, Rello and Baeza-Yates had not controlled the spacing of their different 
typefaces. It was noted in Sect. 8.6 that readers who are dyslexic benefit from 
increased spacing when reading from paper, and that this explains differences in 
their performance across different typefaces. It was also noted in Sect. 10.1 that the 
addition of serifs leads to an increase in inter-letter spacing, at least when letters 
are presented on screen. Perea et al. (2012) found that a small increase in inter- 
letter spacing could lead to enhanced performance when reading from computer 
screens, especially in children with dyslexia. Consequently, any suggestion in Rello 
and Baeza-Yates’ results in favour of sans serif typefaces is likely to be due to the 
confounding of typeface with inter-letter spacing. Otherwise, the results of their 
study indicate no difference in legibility between serif and sans serif typefaces in 
dyslexic readers, and once again this contradicts the advice of the British Dyslexia 
Association (2018). 


13.3 Readers with Age-Related Macular Degeneration 


Several studies have investigated people with age-related macular degeneration 
(AMD). As was mentioned in Sect. 8.2, this causes impaired vision in the centre of the 
visual field. Section 11.1 described a study in which Arditi and Cho (2005) measured 
size thresholds when random five-letter strings were presented on a cathode-ray tube 
(CRT) screen as black letters against a white background. They used software to 
construct lowercase typefaces of uniform thickness with slab serifs extending 0% 
(sans serif), 5%, or 10% of the cap height (the height of capital or uppercase letters). 
They also used an inter-letter spacing of 0%, 10%, or 40% of the cap height, yielding 
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a3 x 3 design. In addition to four participants with normal vision, Arditi and Cho also 
tested two people with AMD. They normalised the data to each participant’s best 
performance by dividing each data point by the participant’s minimum threshold. 
Both the participants with normal vision and the participants with AMD showed the 
same pattern of performance. There was a large effect of spacing such that closely 
spaced letters yielded higher thresholds (i.e., poorer performance). There was also a 
significant but small effect of serif size, such that serifs of 5% or 10% led to lower 
thresholds (i.e., better performance) than a sans serif typeface, which Arditi and 
Cho ascribed to the concomitant increase in spacing required to accommodate them. 
There was no evidence that the presence or absence of serifs made any difference to 
the performance of the participants with AMD. 

Subsequent researchers tried to devise typefaces that might be helpful for people 
with AMD. Bernard et al. (2016) developed Eido, a monospaced sans serif typeface 
that emphasised the distinctive shapes of different letters. They presented normally 
sighted volunteers with letters, words, and sentences in either Eido or the monospaced 
slab serif typeface Courier using a CRT screen. They carried out six different exper- 
iments with varying print sizes. In each case, the participants used their dominant or 
preferred eyes, but an eye-tracking system superimposed a mask over the centre of 
their visual field to simulate AMD. The participants made fewer errors when stimuli 
were presented in Eido than when presented in Courier, but there was no difference 
in their reading speed between the two typefaces. 

Xiong et al. (2018) used both Eido and Maxular Rx, a proportionally spaced slab 
serif typeface developed by Steven Skaggs that employed extra spacing between 
successive letters and lines. For comparison, they also used Courier, Helvetica, and 
Times Roman. (Helvetica is a proportionally spaced sans serif typeface, whereas 
Times Roman is a proportionally spaced serif typeface.) Individual sentences were 
presented on an LCD screen controlled by a Macintosh computer. The participants 
consisted of 19 individuals with AMD, 14 age-matched individuals with normal 
vision, and 26 young adults with normal vision. The researchers measured their 
fastest speed to read the sentences aloud, the smallest print size to achieve that 
speed, and the smallest print size that could just be read. The reading speed varied 
significantly across the five typefaces for the individuals with AMD, but not for the 
control participants. All three groups showed significant variations in the measures 
of print size. However, there were no systematic differences between the three serif 
typefaces (Courier, Maxular Rx, and Times Roman) and the two sans serif typefaces 
(Eido and Helvetica). 
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13.4 Conclusions 


As mentioned in Chap. 8, any differences in the legibility of serif and sans serif 
typefaces might become more apparent in readers whose visual systems are chal- 
lenged as the result of disablement. One study evaluated a heterogeneous sample of 
patients with visual impairment and found no difference in reading speed between a 
serif typeface and a sans serif typeface, regardless of whether text was presented on 
screen as a single paragraph or using the RSVP procedure. Another study evaluated 
a large sample of readers with dyslexia. This too found no difference in their reading 
speed between serif and sans serif typefaces. Any differences in the participants’ 
preferences or in their eye movements could be attributed to the researchers’ failure 
to control the inter-letter spacing of the different typefaces. Differences in inter-letter 
spacing also explain differences in reading speed in a study which simulated AMD 
in readers with normal vision. A study which compared reading in people with and 
without AMD found no systematic differences between serif typefaces and sans serif 
typefaces. 
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Chapter 14 A) 
Reading Text in Internet Browsers cre 


14.1 The Legibility of Serif and Sans Serif Typefaces 
in Internet Browsers 


The research studies described thus far have been mainly concerned with mate- 
rial generated on local workstations using conventional word-processing software. 
However, many readers view material that has been saved in hypertext markup 
language (HTML) on remote sites on the internet to be viewed in web browsers. Even 
so, this is not a hard-and-fast distinction. First, word-processed documents can be 
uploaded to remote web sites and retrieved by other users to read on their smartphones 
or tablet computers as well as their own workstations. Second, how downloaded docu- 
ments appear on-screen will depend on the browser settings and other software on 
the local device. Third, material can also be saved in HTML on local workstations 
and viewed through web browsers in order to mimic the retrieval of information from 
remote web sites. Even so, the question arises whether serif and sans serif typefaces 
differ in legibility when used in documents saved in HTML and viewed through 
web browsers. As with material that is generated on local workstations, designers 
and design educators tend to recommend that sans serif typefaces should be used on 
web sites due to the poor legibility of serif typefaces on low-resolution monitors or 
with small type sizes (Davidow, 2002). However, others maintain that a preference 
for sans serif typefaces for websites simply reflects readers’ greater familiarity with 
sans serif typefaces when accessing sources of information on the internet (Redich, 
2012, p. 62). 

Gosse (1999) asked 200 participants to read stories selected from the websites of 
real newspapers published outside the immediate locality. Eight stories were edited 
to a standard length of 325 words and presented on a laptop computer with a liquid 
crystal display (LCD) screen using a web browser as if they were being viewed on the 
World Wide Web. Four stories were presented in different serif typefaces (Courier, 
New Century Schoolbook, Palatino, and Times), and four were presented in different 
sans serif typefaces (Avant Garde, Hallmarke Light, Helvetica, and Quick Type). The 
stories were assigned at random to four pairs, each containing a story in serif typeface 
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and a story in sans serif typeface. Participants were timed while they read the two 
stories in a pair, and they were then asked a number of questions, including their 
preference between the two typefaces they had seen (pp. 67-75). 

There was no significant difference in either reading time or preference: serif 
passages were read in an average of 92.0 s, and sans serif passages were read in 
an average of 95.8 s; 102 of the participants preferred the serif typefaces, and 98 
preferred the sans serif typefaces (p. 81). Unfortunately, there were two problems 
with this study. First, the opening page on the web browser listed short titles of the 
eight stories, each presented in the appropriate typeface, and the participants were 
free to choose which of the stories they read first (p. 91). This then determined 
which story they read second and which typefaces the participant received. This 
contradicts the author’s assertion that the order of presentation of the different serif 
typefaces was “systematically randomized” (p. 69). Second, in the data analyses, the 
contrast between serif and sans serif typefaces was incorrectly treated as a between- 
subjects variable and not as a within-subjects variable (pp. 84-85), and this would 
have reduced the analyses’ statistical power. 

Grant and Branch (2000) asked 21 undergraduate student teachers to participate 
in an online experiment using a specially constructed website. The stimuli were 
two passages of 164 words taken from the Graduate Record Examination and two 
test questions presented on the same screen as the passages. Students who used a 
Windows platform were shown stimuli in Times New Roman and Arial; those who 
used a Macintosh platform were shown stimuli in Times and Helvetica. Students 
who accessed the website alternately received a serif typeface first or a sans serif 
typeface first; in both cases, all the stimuli were shown as 11-point black text on a 
white background. The system recorded the reading time for each passage, and the 
students were given feedback on their answers to the questions, but these were not 
recorded. The reading data were converted to words per minute and showed that the 
serif typefaces were read significantly more quickly than sans serif typefaces, but 
there was no significant practice effect between the two passages. Grant and Branch 
acknowledged that the number of participants in their study was relatively small 
and that they had compared just two typefaces in each participant. They also had no 
control over the platforms used by the participants. 


14.2 The Research of Bernard and Colleagues 


Michael Bernard and his colleagues carried out a series of experiments to evaluate 
the legibility of different typefaces when reading online material. They identified 
passages of text (typically around 1,000 words for young adult readers) and in each 
case replaced 15 randomly selected words with substitutes. The latter rhymed with 
the original words but were semantically inappropriate to the context. The passages 
were saved in HTML and were viewed on a high-resolution LCD monitor using a 
browser as if they had been retrieved from the internet. The participants were asked 
to read each passage silently but to say the inappropriate words aloud. They were 
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also asked to rate the different passages on several characteristics and to rank their 
overall preference among the different typefaces. 

This research was initially published in Usability News, a biannual newsletter 
that was produced by the Software Usability Research Laboratory at Wichita State 
University. Dyson (2005) argued that these reports could not be relied upon because 
they had not been peer reviewed. In fact, some of the reports were also published as 
articles in academic journals or conference proceedings, where they will certainly 
have been exposed to independent peer review. In such cases, I will cite both versions 
of these reports so that readers can make meaningful comparisons for themselves. 

Bernard and Mills (2000; Bernard et al., 2003) compared the presentation of 
material in either a serif typeface, Times New Roman, or a sans serif typeface, 
Arial, in either 10-point type or a 12-point type, and in either a aliased form or 
an anti-aliased form. There was no significant variation either in the participants’ 
detection of substitutes or in the time they had taken to read the different passages. 
However, the fastest reading time was obtained when the passages were presented in 
12-point aliased Times New Roman, and the slowest time was obtained when they 
were presented in 10-point anti-aliased Arial. There was no significant difference in 
perceived legibility between the passages presented in Arial and those presented in 
Times New Roman, but the passages presented in a 12-point Arial typeface (whether 
aliased or anti-aliased) were rated as sharper than the passages presented in 10-point 
anti-aliased Times New Roman. The passages presented in 12-point Arial (whether 
aliased or anti-aliased) or in 12-point aliased Times New Roman were the most 
preferred. 

Bernard, Mills, Peterson, and Storrer (200le) compared five serif typefaces 
(Century Schoolbook, Courier New, Georgia, Goudy Old Style, and Times New 
Roman), five sans serif typefaces (Agency FB, Arial, Comic Sans MS, Tahoma, and 
Verdana), and two ornate typefaces (Bradley Hand ITC and Monotype Corsiva). The 
typefaces were matched in terms of their body height and were mainly 12-point. 
(Whether they were aliased or anti-aliased was not specified.) There was significant 
variation in the time taken to read the different passages, with Corsiva yielding the 
fastest reading time and Tahoma the slowest. However, there was no overall difference 
in reading time between the serif typefaces and the sans serif typefaces. Moreover, 
when Bernard et al. constructed a measure of “reading efficiency” by dividing the 
percentage of substitutions that were detected by the overall reading time, there was 
no significant variation in this measure among the 12 typefaces. This suggests that any 
variation in reading time represented a trade-off between speed and accuracy rather 
than any genuine differences in reading efficiency. The participants’ perceptions 
showed significant variation, but again there was no overall difference between the 
serif and sans serif typefaces. Finally, Arial, Comic Sans, Tahoma, Verdana, Courier 
New, Georgia, and Century Schoolbook were ranked higher than other typefaces in 
terms of the participants’ overall preference. 

Bernard, Lida, Riley Hackler, and Janzen (2002a) compared eight different type- 
faces. Of the four serif typefaces, two, Courier New and Times New Roman, had 
originally been designed for print applications; one, Century Schoolbook, had been 
designed for educational materials; and one, Georgia, had been designed to be 
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displayed on computer screens. Of the four sans serif typefaces, two, Arial and Comic 
Sans MS, had originally been designed for print applications, and two, Tahoma and 
Verdana, had been designed to be displayed on computer screens. Different groups 
of participants were presented with passages in 10-point, 12-point, or 14-point type. 

Bernard et al. found that passages presented in Times New Roman or Arial were 
read significantly more quickly than those presented in Courier New, Georgia, or 
Century Schoolbook. Again, however, there was no overall difference in reading 
time between the serif and sans serif typefaces. The passages presented in 12-point 
type were read significantly more quickly than those presented in 10-point type. The 
researchers noted that passages which were read more quickly tended to be read 
less accurately: in other words, there was a speed—accuracy trade-off. This time, 
they calculated a measure of reading efficiency by dividing the overall reading time 
by the percentage of substitutions detected (in other words, the reciprocal of their 
previous measure of reading efficiency). On this measure, there was no significant 
variation in this measure among the eight typefaces. The participants’ perceptions 
again showed significant variation, but there was no systematic difference between 
either the ratings or the rankings of the serif typefaces and the sans serif typefaces. 

Bernard and colleagues carried out further experiments with both older and 
younger participants. Bernard, Liao, and Mills (2001b, c) tested 27 adults aged 
between 62 and 83. Passages of around 700 words were presented in two serif 
typefaces, Times New Roman and Georgia, or two sans serif typefaces, Arial and 
Verdana, in either 12-point or 14-point type. In each passage, ten randomly selected 
words had been replaced by substitutes that rhymed with the original words but were 
semantically inappropriate to the context. The participants were also asked to rate 
the different passages with regard to the perceived legibility of the typeface and to 
rank their overall preference of the typefaces. 

The passages presented in 12-point serif typefaces were read significantly less 
quickly than the passages presented in either 14-point serif typefaces or 14-point 
sans serif typefaces. When Bernard et al. constructed a measure of reading efficiency 
by dividing the percentage of substitutions detected by the overall reading time, the 
14-point passages yielded higher scores than the 12-point passages, but there was 
no significant variation in reading efficiency across the four typefaces. Similarly, 
the participants rated the 14-point passages as being more legible than the 12-point 
passages, but there was no significant variation in their ratings of the four typefaces. 
Finally, the 14-point sans serif typefaces were ranked higher than the 12-point sans 
serif typefaces and all of the serif typefaces in terms of the participants’ overall pref- 
erence. No significant differences were found on any measure between the typefaces 
designed for printing on paper (Times New Roman and Arial) and those designed 
for screen display (Georgia and Verdana). 

Bernard, Liao, Chaparro, and Chaparro (2001a) repeated this experiment with 
a new sample of 26 older adults in order to focus on their perceptions of different 
typefaces. After reading each passage, they rated it on 7-point scales in terms of its 
legibility, how easy it was to read, its sharpness and crispness, its attractiveness, and 
its personality. Finally, they ranked their overall preference of the typefaces. The 14- 
point passages obtained higher scores than the 12-point passages, although this was 
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mainly true for men, not for women. No significant differences were found among 
the four typefaces on any of the aspects of the participants’ perceptions. Overall, 
the participants ranked the sans serif typefaces higher than the serif typefaces, but 
there was no systematic difference between the ranks of the typefaces designed for 
printing on paper and the ranks of those designed for screen display. 

Bernard, Mills, Frank, and McKown (2001d; Bernard, Chaparro, Mills, & 
Halcomb, 2002b) tested 27 children aged between 9 and 11. Children’s short stories 
of about 580 words were presented in two serif typefaces, Times New Roman and 
Courier New, or two sans serif typefaces, Arial and Comic Sans MS, in either 12- 
point or 14-point type. In each story, 15 randomly selected words had been replaced 
by substitutes that rhymed with the original words but were semantically inappro- 
priate to the context. The participants were also required to rate the different stories 
with regard to how easy they were to read, whether they enabled them to read faster, 
the attractiveness of the typeface, and whether they would like their schoolbooks to 
use the typeface. Finally, they ranked their overall preference of the eight typefaces. 

There were no significant differences among the four typefaces and the two 
type sizes in terms of either the detection of substitutes or the speed of reading. 
Bernard et al. computed a measure of reading efficiency by dividing the reading 
time by the percentage of substitutes detected (so that lower scores implied higher 
efficiency). The only significant difference was that reading efficiency was less on 
stories presented in Courier New than on stories presented in the other typefaces. 
The stories presented in 14-point type were rated as significantly better than those 
presented in 12-point type in terms of their ease of reading, reading more quickly, 
their attractiveness, and their use in schoolbooks. Stories presented in Times New 
Roman were rated as being less easy to read than those presented in Arial or Comic 
Sans; stories presented in Times New Roman were rated as being less attractive than 
those presented in Comic Sans; and those presented in Times New Roman or Courier 
New were rated as less desirable for use in schoolbooks. Among the 14-point type- 
faces, Arial and Comic Sans were ranked higher in overall preference than Courier 
New or Times New Roman; among the 12-point typefaces, Comic Sans was ranked 
higher in overall preference than the other typefaces. 


14.3 Subsequent Research 


Myung (2003) presented 12 Korean students with newspaper stories of between 
453 and 532 words using the internet browser installed on a personal computer. 
The stories were shown in three typefaces: one serif typeface (Batang) and two sans 
serif typefaces (Dodum and Gulim). The participants were asked to read the stories to 
themselves and then to rate their typographical appearance on a 7-point scale. Myung 
calculated a measure of reading speed by dividing the total number of characters in 
each story by the time taken to read it. There was no significant variation among the 
three typefaces in terms of the participants’ reading speed. However, the application 
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of conjoint analysis to the participants’ preference ratings showed that the stories that 
had been printed in Dodum and Gulim were preferred to those printed in Batang. 

Ling and van Schaik (2006) carried out two experiments to examine the influence 
of typeface and line length on students’ use of web pages. In both cases, the web 
pages were presented in either a serif typeface (12-point Times New Roman) or a 
sans serif typeface (10-point Arial) and in four different line lengths. In their first 
experiment, 72 participants had to say whether or not a mock web page presented 
in a browser contained a specified hyperlink. There were no significant differences 
between the students who saw web pages in the serif typeface and those who saw 
web pages in the sans serif typeface in either the accuracy or the response time for 
hits or in either the accuracy or the response time for correct rejections. 

In their second experiment, 99 participants had to answer questions based upon 
the information contained in five mock web sites, each consisting of 30 pages, on 
various topics. The proportion of correct answers approached 100%. There were no 
significant differences between the students who saw web sites in the serif typeface 
and those who saw web sites in the sans serif typeface in either the time taken to 
carry out their task or the number of web pages that they visited to find the correct 
answers. Ling and van Schaik concluded that there was no difference between the 
serif typeface and the sans serif typeface either in visual search or in information 
retrieval. 

After both experiments, the participants were asked to express a preference 
between the two typefaces and to rate their aesthetic value on a 10-point scale. 
In the first experiment, regardless of which typeface they had seen, the participants 
tended to prefer Arial rather than Times New Roman and to rate Arial more highly 
than Times New Roman in terms of aesthetic value, although the latter difference 
was small in magnitude and unlikely to be of any practical importance. In the second 
experiment, there were no significant differences in either the participants’ preference 
or in their ratings of aesthetic value. 

Chernecky et al. (2006) recruited 22 cancer patients. The patients were assigned 
to workstations in groups of two or three but recorded their individual responses 
on a prepared form. The stimuli were presented using an internet browser, but the 
computers and monitors used were not specified. In one section of the test, the patients 
were presented with examples of text in varying sizes and in different typefaces with 
different backgrounds, two at a time. In each case, they indicated which of the two 
displays that they preferred. The strongest preference was for the serif typeface 
Times New Roman in a ten-point font and in blue lettering on either a tan or white 
background. The next strongest preference was for the sans serif typeface Arial in 
a nine-point font and black lettering on a tan background. However, the sans serif 
typeface Verdana was not preferred. Chernecky et al. ascribed the preference for the 
serif typeface to the fact that it was widely used in books, newspapers, and magazines. 
They did not carry out any kind of statistical analysis of their results, and they did 
not include any comparison group, and so it is unclear whether their findings were 
peculiar to cancer patients or would generalise to other kinds of participant. 

In a study mentioned in Sect. 12.5, Shaikh et al. (2006) obtained participants’ 
ratings of the appropriateness of different typefaces for various online purposes. 
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Fox et al. (2007) selected three of these typefaces judged to be of high, medium, 
or low appropriateness for each of three purposes: a business document, an e-mail 
message, and a narrative for young people. A total of 120 participants were presented 
with an example of each document (a bank letter, an e-mail invitation to a company 
picnic, and an explanation of how fireworks work) in one of the three typefaces. 
Each was presented as an HTML web page and was viewed using a web browser. 
The participants were asked to rate the “personality” of the document using 15 bipolar 
scales from Shaikh et al.’s (2006) study and to rate its “ethos” (their perceptions of 
the author and the intended readership) using five scales. 

The choice of typeface had no significant effect on the participants’ perceptions 
of the business document, except that its author was viewed as less mature if the 
least appropriate typeface was used. If the least appropriate typeface was used for 
the e-mail message, it was viewed as less stable, less practical, more rebellious, 
more youthful, and more feminine; its author was also viewed as less believable, 
less professional, less trustworthy, and less mature. The choice of typeface had no 
significant effect on perceptions of the narrative for young people, except that it 
was viewed as more youthful and more casual if the most appropriate typeface was 
used. Fox et al. concluded that in general on-screen documents were more likely to 
be perceived in a negative manner if they were presented using a less appropriate 
typeface. 

Beymer et al. (2008) carried out a similar study. They presented 82 employees of 
a computer company with one-page stories taken from a science news website. They 
were presented on a computer screen as a series of web pages in a 12-point anti-aliased 
typeface: half of the participants saw the stories in the serif typeface Georgia, and half 
saw them in the sans serif typeface Helvetica. Their eye movements were monitored 
while they read the stories, and a multiple-choice test was administered after each 
story to check their retention. There was no significant difference between the two 
subgroups in their reading speed, in a variety of statistics relating to their eye move- 
ments, or in their retention. Beymer et al. noted that around half of their participants 
reported having a first language other than English. Having found significant differ- 
ences in eye movements related to the participants’ age, they focused on those aged 
30-39. The participants for whom English was the first language produced shorter 
fixations and longer eye movements than those who had some other first language. 
However, neither group showed any differences between the two typefaces. 

Ali et al. (2013) compared the legibility of serif and sans serif typefaces in 48 
Malaysian students reading texts of moderately high difficulty containing 140 words 
in the Malay language. These were presented in a web browser on an LCD monitor 
in a 12-point typeface. For the first 24 participants, the texts were presented in two 
typefaces designed for screen presentation: the serif typeface Georgia and the sans 
serif typeface Verdana. For the second 24 participants, the texts were presented in 
two typefaces designed for printed media: the serif typeface Times New Roman and 
the sans serif typeface Arial. The participants were required to read the texts aloud 
as quickly and accurately as possible, and their performance was monitored by two 
research assistants. Their reading speed and their accuracy were both mapped onto 
scales from 1 to 5, and the results were added together to yield an overall score. 
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There was no sign of any difference in performance either between Georgia and 
Verdana or between Times New Roman and Arial. Two problems with this study 
are that all the students read the texts in the same sequence and that all saw the two 
typefaces in the same sequence; hence, there was no control for transfer effects (such 
as the positive effect of practice or the negative effect of fatigue). The researchers 
also carried out independent sample tests when the observations had been obtained 
by repeated testing of the same participants, and this would once again have reduced 
the analyses’ statistical power. 

Mátrai and Kosztyan (2014) devised web pages containing verbal comprehension 
tasks and compared text presented in the serif typeface Times New Roman and text 
presented in the sans serif typeface Arial. In addition, they manipulated the size of 
the text (three sizes for each typeface), the line length and spacing, the colour of 
the background, and the alignment of the text, which yielded a total of 144 condi- 
tions. Each of 125 university students was asked to solve the tasks on a sample of 
40 web pages. (The computers and the monitors were not specified.) A regression 
analysis found no significant difference between the two typefaces in terms of the 
response latencies. Mátrai and Kosztyan stated that there was a significant difference 
in terms of the proportions of correct responses, but they did not provide any further 
information. In fact, the overall difference was relatively slight (Times New Roman, 
84.0%; Arial, 83.3%) and unlikely to be of any practical importance. Finally, the 
students were asked to express a preference among the different displays, but there 
was no significant difference in their preference between the two typefaces. A funda- 
mental problem with this study is that Mátrai and Kosztyan assigned different verbal 
comprehension tasks to different conditions, but they failed to evaluate whether the 
tasks were of equal difficulty. Consequently, even the small difference that they found 
between the two typefaces in terms of the proportions of correct responses might have 
been due to differences in the difficulty of the relevant tasks rather than to differences 
in the legibility of the typefaces. 

Beyon and Cox-Boyd (2020) carried out a follow-up to the study mentioned in 
Sect. 6.4 by Gasser et al. (2005), who varied the typeface used when participants were 
reading text from paper. Beyon and Cox-Boyd used a text concerning spinal health. 
They presented this text in four different typefaces: a monospaced slab serif typeface 
(Courier New), a monospaced sans serif typeface (Lucida Console), a proportionally 
spaced serif typeface (Palatino Linotype), and a proportionally spaced sans serif 
typeface (Arial). Independent of this, they presented the text in black, blue, or red, 
yielding 12 different conditions. They recruited volunteers from an online website 
(Amazon Mechanical Turk), who were asked to carry out the task as an online survey. 
They were asked to read the text, complete a questionnaire about their attitudes to 
spinal health as a distractor task, and then answer six questions to test their retention 
of the key information contained in the original document. The participants were 
randomly assigned to one of the 12 presentation conditions. Beyon and Cox-Boyd 
found no significant differences in performance among the four typefaces or the three 
type colours. Beyon and Cox-Boyd acknowledged that they had no control over the 
devices or platforms which the participants had used to carry out the task. 
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14.4 Conclusions 


This chapter discussed whether serif and sans serif typefaces differ in their legibility 
when the material is saved in HTML and viewed on-screen through web browsers. 
This includes material saved in local workstations as well as material retrieved from 
the internet. In addition to a variety of individual studies, the chapter described a 
research programme that was carried out by Bernard and colleagues at Wichita State 
University. Further research has been carried out into the use of different typefaces for 
various online purposes. When reading material in internet browsers, by far the most 
common finding is that there is no significant difference between serif typefaces and 
sans serif typefaces in terms of the users’ reading comprehension, reading speed, or 
reading accuracy. There is also no consistent evidence that readers have a preference 
between serif typefaces and sans serif typefaces when reading material in internet 
browsers. Once again, both serif and sans serif typefaces are regarded as being broadly 
appropriate for internet sites, whereas display and cursive typefaces are regarded as 
being generally inappropriate for serious use. 
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General Conclusions to Part II E 


15.1 Key Findings from Part II 


Once again, as was mentioned in Sect. 1.4, Part I of this book has reviewed diverse 
studies using diverse methods of data collection, and this precludes any formal meta- 
analysis to integrate the findings. One must instead focus on the most common 
finding—the modal finding—regarding the legibility of serif and sans serif typefaces: 
superiority of serif typefaces; superiority of sans serif typefaces; or no difference. 
Part II has been concerned with the question of whether there are differences in the 
legibility of serif and sans serif typefaces when they are used to produce material to 
be read on computer monitors or other screens. 

Studies of the legibility of letters and words have proved inconclusive (Sect. 11.1). 
The studies that employed authentic typefaces showed at most that some sans serif 
typefaces (Consolas, Letter Gothic, and Verdana) are more legible than some serif 
typefaces (Times New Roman and the slab serif typeface Courier). In addition, 
researchers who employed artificial typefaces confounded the presence or absence 
of serifs with variations in the width of the letters and variations in the spacing among 
successive letters. Early studies using the tachistoscopic presentation of printed 
letters and words showed that errors in their identification were often the result 
of confusions among visually similar letters (see Sect. 4.2), and this idea has been 
confirmed using screen-based presentation (Sect. 11.2). However, such confusions 
seem to be due to the design of individual characters in specific typefaces rather 
than to the presence or absence of serifs. Indeed, visual confusions are not more 
likely with sans serif typefaces than with serif typefaces, which contradicts the old 
hypothesis that serifs make letters easier to discriminate and identify (Legros, 1922, 
p. 11). 

Early research using print-based presentation suggested that visual confusions 
were much less important when reading connected sentences (Vernon, 1929), but this 
notion does not seem to have been tested using presentation on computer screens. 
Five studies compared the speed with which participants read sentences displayed 
in serif and sans serif typefaces (Sect. 12.1): one found that serif typefaces were 
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read more quickly, two found that sans serif typefaces were read more quickly, 
while two found no significant difference. Three of these studies also compared the 
participants’ accuracy: one found that sans serif typefaces were read more accurately, 
while two found no significant difference. Research into readers’ eye movements has 
not yielded consistent findings with regard to the presence or absence of serifs. 

Five studies compared readers’ comprehension of meaningful text when presented 
on computer screens in serif and sans serif typefaces (Sect. 12.2). Four of the studies 
found no significant difference in the time taken to read the material in question. 
The fifth (Hojjati & Muniandy, 2014) found that material displayed in a sans serif 
typeface was read more quickly, but this study suffered from serious methodological 
problems. Three of the studies reported measures of comprehension: one found an 
advantage for a serif typeface, one found an advantage for a sans serif typeface, and 
the third found no significant difference. 

The rapid serial visual presentation (RSVP) procedure constitutes another means 
of presenting meaningful material, and five studies have compared readers’ perfor- 
mance with serif and sans serif typefaces in terms of their comprehension, speed, or 
accuracy (Sect. 12.3). All five studies found no significant difference between serif 
and sans serif typefaces, except that serif styles were less legible with very small 
type or under conditions of low luminance (when, of course, serifs are likely to have 
been faint or even completely invisible). Two Korean studies compared the legibility 
of serif typefaces and sans serif typefaces when viewed on the screen of a personal 
digital assistant or a smartphone, but neither study found any significant difference 
in the participants’ reading performance (Sect. 12.4). 

Differences in the legibility of serif and sans serif typefaces are not apparent even 
in readers whose visual systems are challenged as the result of disablement. Different 
studies have examined a heterogeneous sample of patients with visual impairment 
(Sect. 13.1), a large sample of readers with dyslexia (Sect. 13.2), and people with and 
without age-related macular degeneration (Sect. 13.3). In each case, there was no 
difference in the participants’ reading speed between serif typefaces and sans serif 
typefaces. Any differences in the participants’ preferences or eye movements could 
be attributed to the researchers’ failure to control the width or inter-letter spacing of 
different typefaces. 

Finally, when reading material in internet browsers, by far the most common 
finding is that there is no significant difference between serif typefaces and sans serif 
typefaces in terms of the users’ reading comprehension, reading speed, or reading 
accuracy (Chap. 14). This applies regardless of whether or not researchers have used 
measures of reading “efficiency” to control for the possibility that readers employ 
some kind of trade-off between their speed and their accuracy. There is also no 
consistent evidence that readers have a preference between serif typefaces and sans 
serif typefaces when reading material in internet browsers. Once again, both serif 
and sans serif typefaces are regarded as being broadly appropriate for internet sites, 
whereas display and cursive typefaces are regarded as being generally inappropriate 
for serious use. 
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15.2 Preferences and Connotations 


Some studies, though not all, have found that readers express a preference for sans 
serif typefaces when reading on computer screens, but in general both serif and sans 
serif typefaces are regarded by users as appropriate for online purposes (Sect. 12.5). 

There is some evidence that serif and sans serif typefaces differ in their connota- 
tions or “personality”. This seems to reflect variations in reader’ expectations, which 
in turn depend on their prior experience and familiarity with different typefaces. 
When reading on computer screens, these differences are small in magnitude and are 
often not statistically significant. One study (Kaspar et al., 2015) found that scientific 
abstracts were rated more positively in a serif typeface than in a sans serif typeface 
when artificial typefaces were used, but the reverse was true when authentic typefaces 
were used. This suggests that other features can override any effect of the presence 
or absence of serifs. 

One limitation of the latter study is that the ratings were provided by students 
and not by experienced teachers or researchers. Nevertheless, it raises the possibility 
that evaluations of academic writing may be influenced by readers’ preferences and 
the connotations of serif and sans serif typefaces. Section 9.2 discussed the possible 
implications of this notion in the context of reading from paper, but similar points 
can be made about reading on-screen: 


e In the context of academic publication, authors will want to be assured that their 
work is evaluated in terms of its content rather than in terms of its typographical 
appearance. The findings of Kaspar et al. (2015) indicate that on-screen evalua- 
tions of scientific abstracts can be influenced by the typeface in which they are 
presented. It is reasonable to assume that the same applies to on-screen evaluations 
of entire articles or books (although there seems to be no empirical evidence on 
this matter). One solution to this problem is for academic journals and publishers 
to require that manuscripts should be submitted for publication in a standard 
typeface so that they can be compared on a like-for-like basis. 

e Inthe context of academic assessment, it is possible that teachers and other asses- 
sors will give more positive on-screen evaluations of students’ online assignments 
if the teachers and their students share the same typographical preferences than if 
they differ in those preferences (although once again there seems to be no empir- 
ical evidence on this matter). It would be both useful and fairer in the interests 
of ensuring valid assessment if teachers responsible for particular course units 
(and, ideally, for entire degree programmes) could agree on their typographical 
preferences and make these known to their students. 


Of course, in both contexts these variations in readers’ expectations and prefer- 
ences might depend on their prior experience and familiarity with different typefaces 
rather than on any intrinsic properties of the typefaces themselves (and there is empir- 
ical evidence on this point in the case of reading from paper, if not in the case of 
reading from screens: see Sect. 9.2). Even so, there is a need for research on the extent 
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to which reviewers’ on-screen evaluations of academic manuscripts and teachers’ on- 
screen evaluations of their students’ assignments are affected by the reviewers’ and 
teachers’ preferences and expectations. 


15.3 Implications for Previous Assumptions 


Where does this leave the recommendations of designers and design educators who 
have traditionally advocated the use of sans serif typefaces for material presented 
on-screen? Poncelet and Proctor (1993, p. 101) simply stated without qualification or 
supporting evidence: “Often san-serif fonts work better on the computer screen than 
serif fonts.” This assertion is clearly not supported by the research evidence reviewed 
in Part II. Similarly, Schriver (1997) claimed without qualification or supporting 
evidence that sans serif was “the preferred style of type for online because of its 
simple, highly legible, modern appearance” (p. 508). Her claims about the appear- 
ance of sans serif typefaces are partially supported by the results of research on the 
connotations of different typefaces, but they do not appear to translate into objective 
differences in their legibility. 

Universal design is an approach to the development of educational websites that 
aims to ensure accessibility for all students instead of implementing ad hoc adjust- 
ments for those with particular disabilities. Proponents of universal design have 
advocated: “Use Sans Serif fonts for text. Letters with serifs are difficult to read on- 
screen and can create visual fatigue when large amounts of text are included on web 
sites” (“Universal Design”, 1999, p. 6). In general, there is no support for the idea 
that serif typefaces are harder to read on-screen than sans serif typefaces. One might 
expect that such effects would be more evident in readers with visual impairment 
under the high demands of the RSVP procedure, yet two studies failed to find any 
significant difference in performance between serif typefaces and sans serif type- 
faces in such readers. The quoted text refers specifically to material included on web 
sites, yet there is no evidence for differences between serif typefaces and sans serif 
typefaces when reading from internet browsers. 

In fact, the notion of universal design has had its critics. For instance, Raymaker 
et al. (2019, p. 148) argued that it was 


ultimately impractical due to the fact that access needs can conflict with each other. For 
example, some guidelines intended for people with intellectual disability . . . recommend 
simplifying vocabulary, which—if implemented without retaining the precision afforded by 
more complex wording—can make language pragmatics more difficult for autistic users to 
understand. . . . Likewise, high-contrast color schemes suitable for people with low vision 
may be painful or unreadable to autistic users with hypersensitive vision 


Even so, Raymaker et al. concurred that websites designed for people with autism 
should “use a plain accessible sans-serif font (e.g., Arial) for ease of readabil- 
ity” (p. 147). Once again, they provided no empirical evidence to support this 
recommendation, and so we are back in the world of “everybody knows”. 
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15.4 Conclusions 


This chapter concludes Part II by summarising and discussing the key findings. 
Studies of the legibility of letters and words when presented on computer monitors 
or other screens have failed to yield a consistent pattern of results, as have studies of 
the legibility of connected sentences, even when using the RSVP procedure. With 
regard to readers’ comprehension of meaningful text when presented on computer 
screens, the modal finding is that of no significant difference in reading speed between 
serif and sans serif typefaces. The studies that reported measures of accuracy in this 
situation failed to show a consistent pattern of results. Some studies have found that 
readers express a preference for sans serif typefaces, and there is some evidence that 
serif and sans serif typefaces differ in their connotations; however, these findings can 
be attributed to readers’ previous experience of reading text on-screen. Previously 
stated assumptions about the legibility of serif and sans serif typefaces when used 
to present material on computer screens are not supported by empirical research 
findings. In short, despite various assertions and recommendations, the conclusion 
of Part II is that there is no difference in the legibility of serif typefaces and sans 
serif typefaces when they are used to produce material that is presented on computer 
monitors or other screens. 
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Both serif typefaces and sans serif typefaces have a long history, going back to 
inscriptions in Ancient Rome and probably earlier (see Sects. 1.2 and 1.3). Histor- 
ically, serif typefaces were used in printing earlier than sans serif typefaces, and 
they have usually been preferred to sans serif typefaces for use in formal documents. 
Section 1.2 described a number of theories aimed at explaining why serifs should 
have survived in modern typography and indeed at explaining why serif typefaces 
should be more legible than sans serif typefaces when reading from paper: (a) that 
serifs constitute additional visual cues to support the reader’s gaze; (b) that they over- 
come the harmful effects of irradiation; and (c) that they facilitate the operation of 
line detectors in the human visual system. However, none of these explanations has 
proved especially convincing in the light of subsequent arguments and evidence. In 
the case of reading from screens, it was argued that serifs and other details might be 
lost when reading from low-resolution monitors, but this is not a plausible argument 
now that high-resolution monitors are widely available. 

A different approach is to claim that serif and sans serif typefaces do not differ in 
their legibility because of inherent properties of serifs themselves, but that the pres- 
ence or absence of serifs serves as a proxy for some other property of typefaces. This 
approach was adopted by Wilkins et al. (2020; see Sect. 11.2), who suggested that 
serifs tended to accentuate the spatial periodicity of letter strokes (i.e., the vertical 
“stripiness” of words), as measured by the first peak in a word’s horizontal autocor- 
relation. Wilkins et al. showed that words of low vertical stripiness are read more 
quickly than words of high vertical stripiness, and that serif typefaces tend to have 
higher stripiness than sans serif typefaces, but they did not show that this leads to 
variations in how quickly words in different typefaces are read. For the moment, the 
theoretical and practical implications of vertical stripiness must remain unclear and 
contentious. More important, the arguments put forward by Wilkins et al. entail that 
serif typefaces should be less legible than sans serif typefaces both when reading 
from paper and when reading from screens, which is not position that has been 
adopted to date by any other researchers. 
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In contrast, there is a good deal of evidence about the negative effects of horizontal 
stripiness—the extent to which successive lines of text tend to resemble horizontal 
stripes. As mentioned in Sect. 4.1, horizontal stripes are known to induce eye strain, 
visual illusions, headache, and even seizures. Wilkins et al. (1984) used the Michelson 
contrast to measure the horizontal “stripiness” of visual displays: this is defined as 
the difference in the luminances of the light and dark sections of a pattern divided by 
the sum of their luminances. In the case of horizontal stripes, Wilkins et al. found a 
clear relationship between this measure and the probability of illusions and seizures. 
Wilkins and Nimmo-Smith (1987) applied the measure to lines of printed text and 
found that it was inversely related to readers’ ratings of the clarity and comfort of 
the material. More recently, Wilkins et al. (2020) applied Fourier analysis to text 
displayed on an LCD screen and obtained similar results. In both studies, the clarity 
and comfort of the text appeared to be improved by increasing the spacing between the 
lines of text relative to the x-height of the typeface. However, Wilkins and Nimmo- 
Smith (1987) had compared the mean Michelson contrast of samples of 24 serif 
typefaces and seven sans serif typefaces: they found a highly significant variation 
across the 31 typefaces, but they did not find any systematic difference between the 
serif typefaces and the sans serif typefaces. While clarity and comfort may be very 
important characteristics of text presented on paper or on screens, there seems to be 
no difference between serif and sans serif typefaces in these characteristics. 

In fact, the copious evidence that has been reviewed in this monograph leads to the 
conclusion that there is no difference in the legibility of serif and sans serif typefaces 
either when reading from paper or when reading from screens. This contradicts 
various assertions made over the last 100 years by typographers, designers, and 
other authority figures. What this means is that assertions to the effect that “everyone 
knows” that such-and-such should not be accepted on the basis of the authority or 
status of the people or organisations making them but should be regarded merely 
as conjectures that might well be subject to refutation through carefully designed 
empirical research (cf. Popper, 1959, 1962.) Of the large number of studies that I 
have reviewed in this monograph, some have been more carefully designed than 
others, and I have identified at least some of the design flaws that are apparent in 
previous research. Nevertheless, the overwhelming thrust of the available evidence is 
that there is no difference in the legibility of serif typefaces and sans serif typefaces 
either when reading from paper or when reading from screens. Typographers and 
software designers should feel able to make full use of both serif typefaces and sans 
serif typefaces, even if legibility is a key criterion in their choice. 
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